Home > Back-end >  TypeError: float() argument must be a string or a number, not 'NAType'
TypeError: float() argument must be a string or a number, not 'NAType'

Time:08-02

I have a column in my dataframe that contains nan values and int values. The original dType was float64, but I was trying to change it to int6, and change nan values to np.nan. now I get this error: TypeError: float() argument must be a string or a number, not 'NAType' when trying to run imputation on it. In the following table, column is similar to "age"

data = {'name':  ['Alex', 'Ben', 'Marry','Alex', 'Ben', 'Marry'],
        'job': ['teacher', 'doctor', 'engineer','teacher', 'doctor', 'engineer'],
        'age': [27, 32, 78,27, 32, 78],
        'weight': [160, 209, 130,164, 206, 132],
        'date': ['6-12-2022', '6-12-2022', '6-12-2022','6-13-2022', '6-13-2022', '6-13-2022']
        }

df = pd.DataFrame(data) df

    |name   |job        |age|weight |date
|---|-------|-----------|---|-------|--------
|0  |Alex   |teacher    |27 |160    |6-12-2022
|1  |Ben    |doctor     |32 |209    |6-12-2022
|2  |Marry  |engineer   |78 |130    |6-12-2022
|3  |Alex   |teacher    |27 |164    |6-13-2022
|4  |Ben    |doctor     |32 |206    |6-13-2022
|5  |Marry  |engineer   |78 |132    |6-13-2022
|6  |Alex   |teacher    |NaN|NaN    |6-14-2022
|7  |Ben    |doctor     |NaN|NaN    |6-14-2022
|8  |Marry  |engineer   |NaN|NaN    |6-14-2022

and this is what I tried:

df['age']=df['age'].astype( dtype={'age': pd.Int8Dtype()})
df.loc[df.age== '<NA>', 'age']=np.nan

Is there any way to change float64 to smaller datatype without causing this issue? Please advise, thanks

CodePudding user response:

Use

df['age'] = df['age'].astype(dtype='Int64')

with extension datatype Int64 (with a capitalized I) rather than the default dtype which is int64 (lower case i). The latter throws an IntCastingNaNError while the former works smoothly.

This functionality was added to Pandas 0.24 and mentioned in this thread.

  • Related