pandas calculation grouping by ID-CodePudding

Is it possible to apply a calculation to a DF by ID. For example if multiple records share a similar ID make that calculation but do it throughout the DF?

df['Total Volume'] = df.groupby('ID')[df['VOLUME'].shift(-1)   df['ADDED'] - df['VOLUME']]

So I want to create the column "Total Volume" using the calculation

df['VOLUME'].shift(-1)   df['ADDED'] - df['VOLUME']

I've had some success with this logic:

df['Total Volume'] = df['VOLUME'].shift(-1)   df['ADDED'] - df['VOLUME']

But it doesn't take into account for the ID and the first record is always a total from another ID.

Ultimately the column should be blank or empty for the first record with every new ID.

CodePudding user response：

IIUC, you want to restrict the calculations to each "ID", right? Then you could use groupby shift on "VOLUME" and do everything else the same.

df['TOTAL VOLUME'] = df.groupby('ID')['VOLUME'].shift(-1)   df['ADDED'] - df['VOLUME']

Note that shift(-1) moves the next record up, so the end result is the last record in each "ID" is empty. Since you say:

Ultimately the column should be blank or empty for the first record with every new ID

I think you want shift(1) instead (by default it's 1):

df['TOTAL VOLUME'] = df.groupby('ID')['VOLUME'].shift()   df['ADDED'] - df['VOLUME']

Then again, the above code can be written even simpler using groupby diff:

df['TOTAL VOLUME'] = df['ADDED'] - df.groupby('ID')['VOLUME'].diff()