I have a bunch of data frames that I concatenate into one large data frame. All rows have a datetime, a name, and then some columns with random values, e.g. the data frame could look something like:
df =
ds name val1 val2 val3
-------------------------------------------------------
2021-07-31 23:23:00 name1 2 3 4
2021-07-31 23:56:00 name2 3 4 5
2021-07-31 23:11:00 name1 4 5 6
2021-07-31 23:34:00 name2 5 6 7
I now need to group these rows by name
and divide them into 60 min bins, which I currently do as follows:
final_df = df.groupby([pd.Grouper(freq="60min", key="ds"), "name"]).mean()
And then the output is a new data frame where rows are grouped by name
and then the val
column value is just the mean of all values for that name
.
And this works. However, what I would like to do is instead of taking the mean of all the columns, maybe the column val2
should instead be the sum of the values - not the mean.
So basically the final output should be:
df_final =
ds name val1 val2 val3
-------------------------------------------------------
2021-07-31 23:00:00 name1 3 7 5
2021-07-31 23:00:00 name2 4 10 6
Can this be done in any way, or would I have to split up my data frame into two, and then join afterwards ?
CodePudding user response:
Use DataFrameGroupBy.agg
with a dictionary:
df.groupby([pd.Grouper(freq="60min", key="ds"), "name"]).agg({'val1': 'mean', 'val2': sum, 'val3': 'mean'})