Handling Nulls in Python: A Hands-On Tutorial
When no value is assigned to a variable, it is referred to as a null value. In the world of data science and data analysis dealing with missing values is common but a challenging problem. It is difficult to analyze data with null values and to achieve accurate results.
So, in this tutorial, we will cover the following aspects of Nulls in Python.
- What is Null in Python?
- Null vs None
- Use of None in Python
What is Null in Python?
In Python, we use None in place of Null, but there is a slight difference between them. Let’s explore each one separately.
Null vs None: Differentiate
None is specifically used in Python to represent the absence of value. When we set a variable to None means that the variable does not point to any object in memory. Generally, we use None to initialize a variable that doesn’t have a meaningful value yet and is not equal to 0, False, or an empty string.
Null in other languages like Java, JavaScript, and SQL represent missing data. The main difference between null and none is that Null is a concept and on the other hand, None is a concrete object in Python.
Use of None in Python
To gain a better understanding of None in Python, we’ll proceed with the following steps:
- Declaring Null Variable
- None as Default parameter
- Detecting Null Variables
- Dealing with Nulls in Python
1. Declaring Null Variable
We can simply declare null in Python using the data type None
as follows.
x_variable = None
We can also check the data type of the variable using a pre-defined function type().
type(x_variable)
Output: Here we can see that the data type x_variable
is None.
NoneType
If we try to declare or print a variable without assigning a value to it, we will get a NameError, and the variable will be considered as undefined.be considered as undefined.
print(y_variable)
Output:
----> 1 y_variable
NameError: name 'y_variable' is not defined
2. None as Default parameter
When we define functions, in Python it’s possible to use None
as a default parameter value to handle optional input. This approach adds flexibility and reusability to our functions and makes them more versatile.
Here is a simple example of how we can use None
as a default parameter:
def greet(name=None):
# Check if 'name' is None
if name is None:
# If 'name' is None, print its type
print(type(name))
# Print a greeting message including the 'name'
print(f"Hello, {name}!")
# Call the function with and without a parameter
greet()
greet("John")
Output:
<class 'NoneType'>
Hello, None!
Hello, John!
When we execute this code, it will print Hello, None! when calling greet()
because we did not provide any argument. And on calling greet("John")
, it prints Hello, John! as a personalized greeting because we provided the name John as an argument.
3. Detecting Null Variables
To test if a variable is None or not, we can use is
an operator. To do that we will initialize two variables with value and without value.
# variable with no value
x_variable = None
# variable with value
y_variable = 8
Let’s apply is
operator to check as follows:
# Check if x_variable is None
if x_variable is None:
print("x_variable is None")
else:
print("The Value of x_variable is: ", x_variable)
# Check if y_variable is None
if y_variable is None:
print("y_variable is None")
else:
print("The Value of y_variable is: ", y_variable)
This code returns None
for x_variable
because we did not assign any value to x_variable
and print the value of y_variable
.
Output:
x_variable is None
The Value of y_variable is: 8
4. Dealing with Nulls in Python
There are several ways to deal with null in Python. Here we will discuss some of the most common and effective methods.
Dropping null values
Removing rows and columns containing null values is the most common approach to handling missing if we have only a few null values. To do this we can use dropna()
function in Pandas as follows.
import pandas as pd
# Create a Data Frame with null values
df = pd.DataFrame({'age': [25, None, 30, 32],
'name': ['John', 'Bob', None, 'Maverick']})
df.head()
Output: Here we have a data table having null values.
# Drop rows with at least one null value
df = df.dropna()
# # Print the Data Frame
print(df)
Output: Now in the following table we have removed null values using dropna()
function.
Filling Null Values
In this method, we will fill the missing data with mean, mode, or other specific values. But for categorical data, we can impute null with the most common value in the data. For this we can use fillna()
function in Pandas as follows.
# Replaces null values with 0
df.fillna(0)
# Replaces null values with the mean of each column
df.fillna(df.mean())
Interpolation
To fill time-series data or continuous variables we use the interpolation method to estimate null values based on nearby data points.
import pandas as pd
df = pd.DataFrame({'age': [25, None, 30]})
# Interpolate the null value in the 'age' column
df['age'] = df['age'].interpolate()
# Print the Data Frame
print(df)
Imputation
We can also fill null values using Scikit-Learn for statistical or machine learning methods in Python.
from sklearn.impute import SimpleImputer
# Create an instance of the Simple Imputer class
imputer = SimpleImputer(strategy='mean')
# Fit the imputer to the Data Frame
imputer.fit_transform(df)
# Impute the null values in the Data Frame
df = imputer.transform(df)
Check Out the Video on Handling Missing Values
For now, this covers the basics of handling nulls in Python. There are many other techniques available, but their suitability depends on the specific data and how and where you want to use them.