Im trying to find a "NaN safe" way to convert a column of basically whole numbers. I do not want NaN's to get converted into 0's as there is a difference (in my context) between a zero and a NaN. When i try to convert the original columns using astype('Int64')
it goes through no problemo. But when i use an agg function such as mean()
the resulting column wont convert to ints even though there are no NaNs in my MWE.
I tried many of the suggestions on this post to no avail: Convert Pandas column containing NaNs to dtype `int`
MWE and Results
#Sample Data
df = pd.DataFrame(data=[[123.0, np.NaN, 456.0], [987.0, 876.0,np.NaN]],
columns=['2018', '2019', '2020'],
index=['sales','support'])
df['avg'] = df.mean(axis=1)
df
2018 2019 2020 avg
sales 123.0 NaN 456.0 289.5
support 987.0 876.0 NaN 931.5
#an original column converts no problem and keeps the NaNs
ts = df['2019'].astype('Int64') # --> works fine
ts[0] #--> <NA>
type(ts[0]) # --> pandas._libs.missing.NAType
#column types appear to be the same
type(df['2018'][0]) # --> np.float64
type(df['avg'][0]) # --> np.float64
#craps out on agg created column
df['avg'].astype(float).astype('Int64') # --> fails with message " cannot safely cast non-equivalent float64 to int64"
df['avg'].astype(float).astype('Int64') # --> fails with message " cannot safely cast non-equivalent float64 to int64"
df['avg'].dtype(pd.Int64Dtype) # --> fails with message "'numpy.dtype[float64]' object is not callable"
df['avg'].dtype(pd.Int64Dtype()) # --> fails with message "'numpy.dtype[float64]' object is not callable"
pd.to_numeric(df['avg']).astype('Int64', errors='ignore') # --> works but leaves me with a column of floats
round(df['avg'], 0) # --> remains a float even though it rounds off the decimals 932.0
CodePudding user response:
Given:
2018 2019 2020
sales 123.0 NaN 456.0
support 987.0 876.0 NaN
Doing:
df = df.astype('Int64')
# If you just want truncation:
df['avg'] = df.mean(axis=1).astype(int).astype('Int64')
# If you want rounding:
# df['avg'] = df.mean(axis=1).round(0).astype('Int64')
print(df)
Output:
2018 2019 2020 avg
sales 123 <NA> 456 289
support 987 876 <NA> 931
CodePudding user response:
np.floor(pd.to_numeric(df['avg'], errors='coerce')).astype('Int64')