I have a column in my dataframe that contains nan values and int values. The original dType was float64, but I was trying to change it to int6, and change nan values to np.nan. now I get this error: TypeError: float() argument must be a string or a number, not 'NAType' when trying to run imputation on it. In the following table, column is similar to "age"
data = {'name': ['Alex', 'Ben', 'Marry','Alex', 'Ben', 'Marry'],
'job': ['teacher', 'doctor', 'engineer','teacher', 'doctor', 'engineer'],
'age': [27, 32, 78,27, 32, 78],
'weight': [160, 209, 130,164, 206, 132],
'date': ['6-12-2022', '6-12-2022', '6-12-2022','6-13-2022', '6-13-2022', '6-13-2022']
}
df = pd.DataFrame(data) df
|name |job |age|weight |date
|---|-------|-----------|---|-------|--------
|0 |Alex |teacher |27 |160 |6-12-2022
|1 |Ben |doctor |32 |209 |6-12-2022
|2 |Marry |engineer |78 |130 |6-12-2022
|3 |Alex |teacher |27 |164 |6-13-2022
|4 |Ben |doctor |32 |206 |6-13-2022
|5 |Marry |engineer |78 |132 |6-13-2022
|6 |Alex |teacher |NaN|NaN |6-14-2022
|7 |Ben |doctor |NaN|NaN |6-14-2022
|8 |Marry |engineer |NaN|NaN |6-14-2022
and this is what I tried:
df['age']=df['age'].astype( dtype={'age': pd.Int8Dtype()})
df.loc[df.age== '<NA>', 'age']=np.nan
Is there any way to change float64 to smaller datatype without causing this issue? Please advise, thanks
CodePudding user response:
Use
df['age'] = df['age'].astype(dtype='Int64')
with extension datatype Int64
(with a capitalized I
) rather than the default dtype
which is int64
(lower case i
). The latter throws an IntCastingNaNError
while the former works smoothly.
This functionality was added to Pandas 0.24 and mentioned in this thread.