Maho Takahashi

Linguistics PhD

research

CV

code

mtakahas[at]ucsd[dot]edu

how to find and deal with NaN values in a pandas dataframe

import pandas as pd
import numpy as np
df = pd.DataFrame({'Date':['10/2/2011', np.nan, '11/2/2011', '12/2/2011', '13/2/2011'],
                   'Product':['umbrella', 'matress', 'badminton', 'shuttle', np.nan],
                   'Updated_Price':[np.nan, 1250, 1450, 1550, 400],
                   'Discount':[10, 8, np.nan, 15, 10]})
df
Date Product Updated_Price Discount
0 10/2/2011 umbrella NaN 10.0
1 NaN matress 1250.0 8.0
2 11/2/2011 badminton 1450.0 NaN
3 12/2/2011 shuttle 1550.0 15.0
4 13/2/2011 NaN 400.0 10.0

display rows with NaN values from any of the columns

df[df.isnull().any(axis=1)]
Date Product Updated_Price Discount
0 10/2/2011 umbrella NaN 10.0
1 NaN matress 1250.0 8.0
2 11/2/2011 badminton 1450.0 NaN
4 13/2/2011 NaN 400.0 10.0

display number of rows with NaN values per column

df.isnull().sum()
Date             1
Product          1
Updated_Price    1
Discount         1
dtype: int64

fill NaN values with the mean of a column they belong

cols = ['Updated_Price', 'Discount']
df[cols] = df[cols].fillna(df.mean()) 
df
Date Product Updated_Price Discount
0 10/2/2011 umbrella 1162.5 10.00
1 NaN matress 1250.0 8.00
2 11/2/2011 badminton 1450.0 10.75
3 12/2/2011 shuttle 1550.0 15.00
4 13/2/2011 NaN 400.0 10.00

fill NaN values with the mean of the values before and after NaN

df['Discount'] = df['Discount'].interpolate()
df
Date Product Updated_Price Discount
0 10/2/2011 umbrella NaN 10.0
1 NaN matress 1250.0 8.0
2 11/2/2011 badminton 1450.0 11.5
3 12/2/2011 shuttle 1550.0 15.0
4 13/2/2011 NaN 400.0 10.0

fill NaN values with the value of the previous row

df['Date'] = df['Date'].fillna(method='ffill')
df
Date Product Updated_Price Discount
0 10/2/2011 umbrella NaN 10.0
1 10/2/2011 matress 1250.0 8.0
2 11/2/2011 badminton 1450.0 11.5
3 12/2/2011 shuttle 1550.0 15.0
4 13/2/2011 NaN 400.0 10.0

delete rows with NaN values

df.dropna(subset=['Product'], inplace=True)
df
Date Product Updated_Price Discount
0 10/2/2011 umbrella 1162.5 10.00
1 10/2/2011 matress 1250.0 8.00
2 11/2/2011 badminton 1450.0 10.75
3 12/2/2011 shuttle 1550.0 15.00