I have a pandas dataframe laid out like the following:
[Name_Date] [Var_A] [Var_1] [Var_2] ...
FooBar_09/2021 9 1 9
FooBar_09/2021 5 2 8
FooBar_09/2021 3 5 6
BarFoo_03/2020 8 3 2
BarFoo_03/2020 3 4 4 ...
BarFoo_03/2020 4 3 6
BarBar_04/2017 3 1 5
BarBar_04/2017 7 1 3
BarBar_04/2017 1 3 1 ...
I'd like to create a new dataframe with unique values from [Name_Date], and the mean values from [Var_A] based on the groups in [Name_Date]. I've gotten this far with the following line:
df_mean = df.groupby('Name_Date', as_index=False)['Var_A'].mean()
What I'd like to do is then expand on this by calculating the mean of columns [Var1] and [Var2], and dividing them by the mean of [Var_A]. I am sure I could do this calculation one by one in a similar fashion to the line above, however I have about a dozen of these [Var] columns so I'm looking for a more expiditious way to do this if anyone can make any suggestions. The end result I'm trying to achieve can be seen below:
[Name_Date] [Var_A_mean] [mean Var_A / mean Var_1] [mean Var_A / mean Var_2]
FooBar_09/2021 5.6 0.47 1.3
BarFoo_03/2020 5 0.66 0.8
BarBar_04/2017 3.6 0.46 0.83
Thanks for the help.
CodePudding user response:
Use groupby
to compute the mean for all columns then div
on index axis:
df_mean = df.groupby('Name_Date').mean()
df_mean.update(df_mean.iloc[:, 1:].div(df_mean['Var_A'], axis=0))
print(df_mean)
# Output:
Var_A Var_1 Var_2
Name_Date
BarBar_04/2017 3.666667 0.454545 0.818182
BarFoo_03/2020 5.000000 0.666667 0.800000
FooBar_09/2021 5.666667 0.470588 1.352941
CodePudding user response:
You can simply get the mean of all 3 columns, then calculate the div and rename them:
P.S, by the result numbers it seems like it's Var_1 / Var_A
and Var_2 / Var_A
, which is different from the names you provided
df_mean = df.groupby('Name_Date', as_index=False)[['Var_A', 'Var_1', 'Var_2']].mean()
df_mean['Var_1'] = df_mean['Var_1']/df_mean['Var_A']
df_mean['Var_2'] = df_mean['Var_2']/df_mean['Var_A']
df_mean.columns = ['Name_Date', 'Var_A_mean','mean Var_A / mean Var_1', 'mean Var_A / mean Var_2']