Say I have a df that looks like this:
name day var_A var_B
0 Pete Wed 4 5
1 Luck Thu 1 10
2 Pete Sun 10 10
And I want to sum var_A and var_B for every name/person and then get the average of this sum by the number of ocurrences of that name/person.
Let's take Pete for example. Sum his variables (in this case, (4 10) (5 10) = 29), and divide this sum by the ocurrences of Pete in the df (29/2 = 14,5). And the "day" column would be eliminated, there would be only one column for the name and another for the average. Would look like this:
>>> df.method().method()
name avg
0 Pete 14.5
1 Luck 11.0
I've been trying to do this using groupby and other methods, but I eventually got stuck. Any help would be appreciated.
CodePudding user response:
I came up with
df.groupby('name')['var_A', 'var_B'].apply(lambda g: g.stack().sum()/len(g)).rename('avg').reset_index()
which produces the correct result, but I'm not sure it's the most elegant way.
CodePudding user response:
pandas' groupby is a lazy expression, and as such it is reusable:
# create group
group = df.drop(columns="day").groupby("name")
# compute output
group.sum().sum(1) / group.size()
name
Luck 11.0
Pete 14.5
dtype: float64