Better way to use pandas DataFrameGroupBy objects-CodePudding

Ok so this is more of a question about how to properly use the groupby method since I am kinda struggling to use the DataFrameGroupBy object itself. Basically I have a big DataFrame with the following structure:

DATE	PRODUCT	PRICE	CAPACITY
01.07.2022	NEG_00_04	3,7	7
01.07.2022	NEG_00_04	1,7	3
01.07.2022	NEG_00_04	2,4	5
01.07.2022	NEG_00_04	2,2	7
01.07.2022	POS_00_04	3,7	2
01.07.2022	POS_00_04	3,2	5
01.07.2022	POS_00_04	1,5	2
01.07.2022	POS_00_04	2,4	3

My goal is to groupby the 'DATE' and 'PRODUCT' columns and get a cumulative capacity based on an ascending price. So basically the order of operation is to groupby the two columns then sort each group by the 'PRICE' column and calculate the cumulative capacity. the end result based on the sample table should look like this:

DATE	PRODUCT	PRICE	CAPACITY	CUMULATIVE
01.07.2022	NEG_00_04	1,7	3	3
01.07.2022	NEG_00_04	2,2	7	10
01.07.2022	NEG_00_04	2,4	5	15
01.07.2022	NEG_00_04	3,7	7	22
01.07.2022	POS_00_04	1,5	2	2
01.07.2022	POS_00_04	2,4	3	5
01.07.2022	POS_00_04	3,2	5	10
01.07.2022	POS_00_04	3,7	2	12

I already have a solution that does work but I was wondering if there isn't a better way to work with DataFrameGroupBy objects since I always just iterate through them with a for loop and it just doesn't seem right. This is how I did it:

df_result = pd.DataFrame()
for i, group in df.groupby(by=['DATE', 'PRODUCT']):
    group.sort_values('PRICE', inplace=True)
    group['CUMULATIVE'] = group['CAPACITY'].cumsum()
    df_result = pd.concat([df_result, group], ignore_index=True)

I would appreciate any suggestions for improvement :)

CodePudding user response：

Use:

df = df.sort_values('PRICE')
df['CUMULATIVE'] = df.groupby(by=['DATE', 'PRODUCT'])['CAPACITY'].cumsum()

Or:

df = df.sort_values(['PRICE','DATE', 'PRODUCT'])
df['CUMULATIVE'] = df.groupby(by=['DATE', 'PRODUCT'])['CAPACITY'].cumsum()