Home > Blockchain >  Using agg function in Pandas makes column unconvertable from float to int
Using agg function in Pandas makes column unconvertable from float to int

Time:08-13

Im trying to find a "NaN safe" way to convert a column of basically whole numbers. I do not want NaN's to get converted into 0's as there is a difference (in my context) between a zero and a NaN. When i try to convert the original columns using astype('Int64') it goes through no problemo. But when i use an agg function such as mean() the resulting column wont convert to ints even though there are no NaNs in my MWE.

I tried many of the suggestions on this post to no avail: Convert Pandas column containing NaNs to dtype `int`

MWE and Results

#Sample Data
df = pd.DataFrame(data=[[123.0, np.NaN, 456.0], [987.0, 876.0,np.NaN]],
                  columns=['2018', '2019', '2020'],
                  index=['sales','support'])
df['avg'] = df.mean(axis=1)
df

       2018    2019     2020     avg
sales   123.0   NaN     456.0   289.5
support 987.0   876.0   NaN     931.5

#an original column converts no problem and keeps the NaNs
ts = df['2019'].astype('Int64') # --> works fine
ts[0]  #--> <NA>
type(ts[0]) # --> pandas._libs.missing.NAType

#column types appear to be the same
type(df['2018'][0]) # --> np.float64
type(df['avg'][0]) # --> np.float64

#craps out on agg created column
df['avg'].astype(float).astype('Int64') # --> fails with message " cannot safely cast non-equivalent float64 to int64"
df['avg'].astype(float).astype('Int64') # --> fails with message " cannot safely cast non-equivalent float64 to int64"
df['avg'].dtype(pd.Int64Dtype)   # --> fails with message "'numpy.dtype[float64]' object is not callable"
df['avg'].dtype(pd.Int64Dtype()) # --> fails with message "'numpy.dtype[float64]' object is not callable"
pd.to_numeric(df['avg']).astype('Int64', errors='ignore') # --> works but leaves me with a column of floats
round(df['avg'], 0) # --> remains a float even though it rounds off the decimals 932.0

CodePudding user response:

Given:

          2018   2019   2020
sales    123.0    NaN  456.0
support  987.0  876.0    NaN

Doing:

df = df.astype('Int64')
# If you just want truncation:
df['avg'] = df.mean(axis=1).astype(int).astype('Int64')
# If you want rounding:
# df['avg'] = df.mean(axis=1).round(0).astype('Int64')
print(df)

Output:

         2018  2019  2020  avg
sales     123  <NA>   456  289
support   987   876  <NA>  931

CodePudding user response:

np.floor(pd.to_numeric(df['avg'], errors='coerce')).astype('Int64')
  • Related