Home > Mobile >  Calculating column means based on values in other columns in pandas
Calculating column means based on values in other columns in pandas

Time:12-10

I have a pandas dataframe laid out like the following:

[Name_Date]          [Var_A]     [Var_1]    [Var_2]   ...
FooBar_09/2021          9           1          9
FooBar_09/2021          5           2          8
FooBar_09/2021          3           5          6
BarFoo_03/2020          8           3          2      
BarFoo_03/2020          3           4          4      ...
BarFoo_03/2020          4           3          6
BarBar_04/2017          3           1          5
BarBar_04/2017          7           1          3
BarBar_04/2017          1           3          1      ...

I'd like to create a new dataframe with unique values from [Name_Date], and the mean values from [Var_A] based on the groups in [Name_Date]. I've gotten this far with the following line:

df_mean = df.groupby('Name_Date', as_index=False)['Var_A'].mean()

What I'd like to do is then expand on this by calculating the mean of columns [Var1] and [Var2], and dividing them by the mean of [Var_A]. I am sure I could do this calculation one by one in a similar fashion to the line above, however I have about a dozen of these [Var] columns so I'm looking for a more expiditious way to do this if anyone can make any suggestions. The end result I'm trying to achieve can be seen below:

[Name_Date]        [Var_A_mean]   [mean Var_A / mean Var_1]   [mean Var_A / mean Var_2]
FooBar_09/2021         5.6               0.47                     1.3   
BarFoo_03/2020          5                0.66                     0.8
BarBar_04/2017         3.6               0.46                     0.83

Thanks for the help.

CodePudding user response:

Use groupby to compute the mean for all columns then div on index axis:

df_mean = df.groupby('Name_Date').mean()
df_mean.update(df_mean.iloc[:, 1:].div(df_mean['Var_A'], axis=0))
print(df_mean)

# Output:
                   Var_A     Var_1     Var_2
Name_Date                                   
BarBar_04/2017  3.666667  0.454545  0.818182
BarFoo_03/2020  5.000000  0.666667  0.800000
FooBar_09/2021  5.666667  0.470588  1.352941

CodePudding user response:

You can simply get the mean of all 3 columns, then calculate the div and rename them:

P.S, by the result numbers it seems like it's Var_1 / Var_A and Var_2 / Var_A, which is different from the names you provided

df_mean = df.groupby('Name_Date', as_index=False)[['Var_A', 'Var_1', 'Var_2']].mean()
df_mean['Var_1'] = df_mean['Var_1']/df_mean['Var_A']
df_mean['Var_2'] = df_mean['Var_2']/df_mean['Var_A']
df_mean.columns = ['Name_Date', 'Var_A_mean','mean Var_A / mean Var_1', 'mean Var_A / mean Var_2']
  • Related