Home > database >  Python Pandas, Running Sum, based on previous rows value and grouped
Python Pandas, Running Sum, based on previous rows value and grouped

Time:07-26

I have a pandas dataframe along these lines, based on where a customer service case sits before being closed. Every time the case is edited and audit trial is captured. I want to generate a counter for each time the Department of a case changes from the department it was previously in.

ID Department Start Date End Date
A Sales 01/01/2022 02/01/2022
A Sales 02/01/2022 03/01/2022
A Operations 03/01/2022 04/01/2022
A Sales 04/01/2022 05/01/2022
B Finance 01/01/2022 02/01/2022
B Risk 02/01/2022 03/01/2022

The output I want to achieve is shown below, the part I am struggling with is getting the 'Count of Department Change' value to increase when the ticket returns to a department it has already been in.

ID Department Start Date End Date Count of Department Change
A Sales 01/01/2022 02/01/2022 0
A Sales 02/01/2022 03/01/2022 0
A Operations 03/01/2022 04/01/2022 1
A Sales 04/01/2022 05/01/2022 2
B Finance 01/01/2022 02/01/2022 0
B Risk 02/01/2022 03/01/2022 1

Using the following code I am able to flag when the department changes for a given case.

df['Dept_Change_Count'] = np.where((df['Department'] != df['Department'].shift(1)) & (df['ID'] == df['ID'].shift(1)), '1', '0')

I'm thinking I could use the df['Dept_Change_Count'] and a running sum along the ID to generate the output I'm after but I haven't had much luck so far.

Any help greatly appreciated!

CodePudding user response:

Compare previous and current row in Department per ID then again group by ID and calculate cumsum to generate counter

m = df['Department'] != df.groupby('ID')['Department'].shift()
df['Dept_Change_Count'] = m.groupby(df['ID']).cumsum() - 1

Alternative approach using a single groupby with lambda func to calculate cumsum:

df['Dept_Change_Count'] = df.groupby('ID')['Department']\
                            .apply(lambda s: (s != s.shift()).cumsum()) - 1

  ID  Department  Start Date    End Date  Dept_Change_Count
0  A       Sales  01/01/2022  02/01/2022                  0
1  A       Sales  02/01/2022  03/01/2022                  0
2  A  Operations  03/01/2022  04/01/2022                  1
3  A       Sales  04/01/2022  05/01/2022                  2
4  B     Finance  01/01/2022  02/01/2022                  0
5  B        Risk  02/01/2022  03/01/2022                  1
  • Related