So I have this dataframe where category feature has both float and nan values. I want to convert all float values to integers. For that I tried
df['category'] = df['category'].apply(lambda x:int(x) if np.isnan(x)==False else x)
Unfortunately this code doesn't do anything. Why is that? And how can I modify this code for my own purpose?
Thank you
CodePudding user response:
Since integer cannot represent None/NaN values, pandas converts the series to float64.
import pandas as pd
import numpy as np
df = pd.DataFrame({"A":[0.51,None,8.0,7,0.0,-89,np.NaN]})
df.A.apply(lambda x: int(x) if not np.isnan(x) else x).apply(type)
0 <class 'float'>
1 <class 'float'>
2 <class 'float'>
3 <class 'float'>
4 <class 'float'>
5 <class 'float'>
6 <class 'float'>
Name: A, dtype: object
df.A.apply(lambda x: int(x) if not np.isnan(x) else 'FOO').apply(type)
0 <class 'int'>
1 <class 'str'>
2 <class 'int'>
3 <class 'int'>
4 <class 'int'>
5 <class 'int'>
6 <class 'str'>
CodePudding user response:
Try this: (this code convert np.nan
to zero
)
df["category"] = np.nan_to_num(df['category']).astype(int)
Example:
df = pd.DataFrame({"category":[1.51,None,8.0,7.0,0.0,-89,np.NaN]})
df["category"] = np.nan_to_num(df['category']).astype(int)
print(df["category"])
Output:
0 1
1 0
2 8
3 7
4 0
5 -89
6 0
Name: category, dtype: int64