Home > Back-end >  pandas calculation grouping by ID
pandas calculation grouping by ID

Time:03-15

Is it possible to apply a calculation to a DF by ID. For example if multiple records share a similar ID make that calculation but do it throughout the DF?

df['Total Volume'] = df.groupby('ID')[df['VOLUME'].shift(-1)   df['ADDED'] - df['VOLUME']]

So I want to create the column "Total Volume" using the calculation

df['VOLUME'].shift(-1)   df['ADDED'] - df['VOLUME']

I've had some success with this logic:

df['Total Volume'] = df['VOLUME'].shift(-1)   df['ADDED'] - df['VOLUME']

But it doesn't take into account for the ID and the first record is always a total from another ID.

Ultimately the column should be blank or empty for the first record with every new ID.

CodePudding user response:

IIUC, you want to restrict the calculations to each "ID", right? Then you could use groupby shift on "VOLUME" and do everything else the same.

df['TOTAL VOLUME'] = df.groupby('ID')['VOLUME'].shift(-1)   df['ADDED'] - df['VOLUME']

Note that shift(-1) moves the next record up, so the end result is the last record in each "ID" is empty. Since you say:

Ultimately the column should be blank or empty for the first record with every new ID

I think you want shift(1) instead (by default it's 1):

df['TOTAL VOLUME'] = df.groupby('ID')['VOLUME'].shift()   df['ADDED'] - df['VOLUME']

Then again, the above code can be written even simpler using groupby diff:

df['TOTAL VOLUME'] = df['ADDED'] - df.groupby('ID')['VOLUME'].diff()
  • Related