Home > front end >  How can i fill nan values in a df using group mean?
How can i fill nan values in a df using group mean?

Time:11-14

I can fill the missing data for numerical values based on the following python code

df.fillna(df.select_dtypes(include='number').mean().iloc[0], inplace=True)

But this will only fill Nan with the overall mean. I have a column with categorical variables and I need to fill the mean values based on the categories in this column.

Edit: This is part of the df I'm working on. I want to fill the NAN with their respective column means that are grouped by TFOPWG Disposition labels.

          TIC ID  TFOPWG Disposition  TESS Mag  TESS Mag err    RA  
TOI                                                                        
101.01  231663901                 KP   12.4069         0.006  318.737000   
102.01  149603524                 KP    9.7109         0.006   87.139833   
103.01  336732616                 KP   11.5232         0.008  312.457500   
104.01  231670397                 KP    9.8638         0.006  319.949708   
105.01  144065872                 KP    9.4995         0.006  337.457833   

              Dec  PM RA (mas/yr)  PM RA err (mas/yr)  PM Dec (mas/yr)  \
TOI                                                                      
101.01 -55.871864          12.641               0.044          -16.011   
102.01 -63.988328         -15.641               0.037           26.046   
103.01 -24.428694          10.426               0.070           15.620   
104.01 -58.148933          10.552               0.045          -10.658   
105.01 -48.003100          91.976               0.052           -6.861   

        Period (days)  Stellar Distance (pc)  Stellar Distance (pc) err  \
TOI                                                                       
101.01       1.430369                375.310                     4.4110   
102.01       4.411929                175.631                     0.5880   
103.01       3.547854                411.211                     7.7520   
104.01       4.087493                316.678                     2.9655   
105.01       2.184670                137.544                     0.7905   

        Stellar Eff Temp (K)  Stellar Eff Temp (K) err  \
TOI                                                      
101.01                5600.0                       NaN   
102.01                6280.0                       NaN   
103.01                6351.0                       NaN   
104.01                6036.0                       NaN   
105.01                5630.0                       NaN   

        Stellar log(g) (cm/s^2) err  Stellar Radius (R_Sun)  \
TOI                                                           
101.01                          NaN                0.890774   
102.01                          NaN                1.210000   
103.01                          NaN                1.400000   
104.01                          NaN                2.218670   
105.01                          NaN                1.240000   

        Stellar Radius (R_Sun) err  
TOI                                 
101.01                    0.043847  
102.01                    0.050000  
103.01                         NaN  
104.01                    0.102573  
105.01                    0.060000  

CodePudding user response:

You can use groupby().transform() to place the mean of the group at every row, then you can fillna:

df.fillna(df.groupby('category_column').transform('mean'), inplace=True)
  • Related