I can fill the missing data for numerical values based on the following python code
df.fillna(df.select_dtypes(include='number').mean().iloc[0], inplace=True)
But this will only fill Nan with the overall mean. I have a column with categorical variables and I need to fill the mean values based on the categories in this column.
Edit: This is part of the df I'm working on. I want to fill the NAN with their respective column means that are grouped by TFOPWG Disposition labels.
TIC ID TFOPWG Disposition TESS Mag TESS Mag err RA
TOI
101.01 231663901 KP 12.4069 0.006 318.737000
102.01 149603524 KP 9.7109 0.006 87.139833
103.01 336732616 KP 11.5232 0.008 312.457500
104.01 231670397 KP 9.8638 0.006 319.949708
105.01 144065872 KP 9.4995 0.006 337.457833
Dec PM RA (mas/yr) PM RA err (mas/yr) PM Dec (mas/yr) \
TOI
101.01 -55.871864 12.641 0.044 -16.011
102.01 -63.988328 -15.641 0.037 26.046
103.01 -24.428694 10.426 0.070 15.620
104.01 -58.148933 10.552 0.045 -10.658
105.01 -48.003100 91.976 0.052 -6.861
Period (days) Stellar Distance (pc) Stellar Distance (pc) err \
TOI
101.01 1.430369 375.310 4.4110
102.01 4.411929 175.631 0.5880
103.01 3.547854 411.211 7.7520
104.01 4.087493 316.678 2.9655
105.01 2.184670 137.544 0.7905
Stellar Eff Temp (K) Stellar Eff Temp (K) err \
TOI
101.01 5600.0 NaN
102.01 6280.0 NaN
103.01 6351.0 NaN
104.01 6036.0 NaN
105.01 5630.0 NaN
Stellar log(g) (cm/s^2) err Stellar Radius (R_Sun) \
TOI
101.01 NaN 0.890774
102.01 NaN 1.210000
103.01 NaN 1.400000
104.01 NaN 2.218670
105.01 NaN 1.240000
Stellar Radius (R_Sun) err
TOI
101.01 0.043847
102.01 0.050000
103.01 NaN
104.01 0.102573
105.01 0.060000
CodePudding user response:
You can use groupby().transform()
to place the mean of the group at every row, then you can fillna
:
df.fillna(df.groupby('category_column').transform('mean'), inplace=True)