Home > Net >  Groupby with both sum() and mean() [duplicate]
Groupby with both sum() and mean() [duplicate]

Time:09-24

I have a bunch of data frames that I concatenate into one large data frame. All rows have a datetime, a name, and then some columns with random values, e.g. the data frame could look something like:

df =

ds                    name       val1     val2     val3
-------------------------------------------------------
2021-07-31 23:23:00   name1      2        3        4
2021-07-31 23:56:00   name2      3        4        5
2021-07-31 23:11:00   name1      4        5        6
2021-07-31 23:34:00   name2      5        6        7

I now need to group these rows by name and divide them into 60 min bins, which I currently do as follows:

final_df = df.groupby([pd.Grouper(freq="60min", key="ds"), "name"]).mean()

And then the output is a new data frame where rows are grouped by name and then the val column value is just the mean of all values for that name.

And this works. However, what I would like to do is instead of taking the mean of all the columns, maybe the column val2 should instead be the sum of the values - not the mean.

So basically the final output should be:

df_final =

ds                    name       val1     val2     val3
-------------------------------------------------------
2021-07-31 23:00:00   name1      3        7        5
2021-07-31 23:00:00   name2      4        10       6

Can this be done in any way, or would I have to split up my data frame into two, and then join afterwards ?

CodePudding user response:

Use DataFrameGroupBy.agg with a dictionary:

df.groupby([pd.Grouper(freq="60min", key="ds"), "name"]).agg({'val1': 'mean', 'val2': sum, 'val3': 'mean'})
  • Related