I have some NaN value in a column within a dataframe. If I used pd.isnull(), I am able to extract these rows - see below
However, if I try to use a function to assign a new value to these null value, it doesn't work - see below. What would be the right way to identify null value inside a function?
I know I could use fillna function to replace all null with an empty space, then modify the if statement to if str(x) == '' but I try to understand how python (not pandas) treat NaN. Thanks
Below is the code:
import pandas as pd
df[df['comment'].isnull()]
def new_column(x):
if str(x) is None:
return 'no value'
else:
return x
df['test'] = df.apply(lambda row: new_column(row['comment']),axis=1)
CodePudding user response:
Inside a function, you can use pd.isna
:
df['test'] = df['comment'].apply(lambda x: 'no value' if pd.isna(x) else x)
# Or
def new_column(x):
if pd.isna(x):
return 'no value'
else:
return x
df['test'] = df['comment'].apply(new_column)
but you have many other "vectorized" ways:
Use fillna
:
df['test'] = df['comment'].fillna('no value')
Or use np.where
:
df['test'] = np.where(df['comment'].isna(), 'no value', df['comment'])
Or where
from pandas:
d['test'] = df['comment'].where(df['comment'].notna(), other='no value')