Home > Software engineering >  With pandas, how to create a table of average ocurrences, using more than one column?
With pandas, how to create a table of average ocurrences, using more than one column?

Time:02-13

Say I have a df that looks like this:

    name  day  var_A   var_B
0   Pete  Wed      4       5
1   Luck  Thu      1      10
2   Pete  Sun     10      10

And I want to sum var_A and var_B for every name/person and then get the average of this sum by the number of ocurrences of that name/person.

Let's take Pete for example. Sum his variables (in this case, (4 10) (5 10) = 29), and divide this sum by the ocurrences of Pete in the df (29/2 = 14,5). And the "day" column would be eliminated, there would be only one column for the name and another for the average. Would look like this:

>>> df.method().method()

        name   avg
    0   Pete  14.5
    1   Luck  11.0

I've been trying to do this using groupby and other methods, but I eventually got stuck. Any help would be appreciated.

CodePudding user response:

I came up with

df.groupby('name')['var_A', 'var_B'].apply(lambda g: g.stack().sum()/len(g)).rename('avg').reset_index()

which produces the correct result, but I'm not sure it's the most elegant way.

CodePudding user response:

pandas' groupby is a lazy expression, and as such it is reusable:


# create group
group = df.drop(columns="day").groupby("name")

# compute output
 group.sum().sum(1) / group.size()
 
name
Luck    11.0
Pete    14.5
dtype: float64
  • Related