Home > Back-end >  Better way to use pandas DataFrameGroupBy objects
Better way to use pandas DataFrameGroupBy objects

Time:11-28

Ok so this is more of a question about how to properly use the groupby method since I am kinda struggling to use the DataFrameGroupBy object itself. Basically I have a big DataFrame with the following structure:

DATE PRODUCT PRICE CAPACITY
01.07.2022 NEG_00_04 3,7 7
01.07.2022 NEG_00_04 1,7 3
01.07.2022 NEG_00_04 2,4 5
01.07.2022 NEG_00_04 2,2 7
01.07.2022 POS_00_04 3,7 2
01.07.2022 POS_00_04 3,2 5
01.07.2022 POS_00_04 1,5 2
01.07.2022 POS_00_04 2,4 3

My goal is to groupby the 'DATE' and 'PRODUCT' columns and get a cumulative capacity based on an ascending price. So basically the order of operation is to groupby the two columns then sort each group by the 'PRICE' column and calculate the cumulative capacity. the end result based on the sample table should look like this:

DATE PRODUCT PRICE CAPACITY CUMULATIVE
01.07.2022 NEG_00_04 1,7 3 3
01.07.2022 NEG_00_04 2,2 7 10
01.07.2022 NEG_00_04 2,4 5 15
01.07.2022 NEG_00_04 3,7 7 22
01.07.2022 POS_00_04 1,5 2 2
01.07.2022 POS_00_04 2,4 3 5
01.07.2022 POS_00_04 3,2 5 10
01.07.2022 POS_00_04 3,7 2 12

I already have a solution that does work but I was wondering if there isn't a better way to work with DataFrameGroupBy objects since I always just iterate through them with a for loop and it just doesn't seem right. This is how I did it:

df_result = pd.DataFrame()
for i, group in df.groupby(by=['DATE', 'PRODUCT']):
    group.sort_values('PRICE', inplace=True)
    group['CUMULATIVE'] = group['CAPACITY'].cumsum()
    df_result = pd.concat([df_result, group], ignore_index=True)

I would appreciate any suggestions for improvement :)

CodePudding user response:

Use:

df = df.sort_values('PRICE')
df['CUMULATIVE'] = df.groupby(by=['DATE', 'PRODUCT'])['CAPACITY'].cumsum()

Or:

df = df.sort_values(['PRICE','DATE', 'PRODUCT'])
df['CUMULATIVE'] = df.groupby(by=['DATE', 'PRODUCT'])['CAPACITY'].cumsum()
  • Related