I have a pandas dataframe along these lines, based on where a customer service case sits before being closed. Every time the case is edited and audit trial is captured. I want to generate a counter for each time the Department of a case changes from the department it was previously in.
ID | Department | Start Date | End Date |
---|---|---|---|
A | Sales | 01/01/2022 | 02/01/2022 |
A | Sales | 02/01/2022 | 03/01/2022 |
A | Operations | 03/01/2022 | 04/01/2022 |
A | Sales | 04/01/2022 | 05/01/2022 |
B | Finance | 01/01/2022 | 02/01/2022 |
B | Risk | 02/01/2022 | 03/01/2022 |
The output I want to achieve is shown below, the part I am struggling with is getting the 'Count of Department Change' value to increase when the ticket returns to a department it has already been in.
ID | Department | Start Date | End Date | Count of Department Change |
---|---|---|---|---|
A | Sales | 01/01/2022 | 02/01/2022 | 0 |
A | Sales | 02/01/2022 | 03/01/2022 | 0 |
A | Operations | 03/01/2022 | 04/01/2022 | 1 |
A | Sales | 04/01/2022 | 05/01/2022 | 2 |
B | Finance | 01/01/2022 | 02/01/2022 | 0 |
B | Risk | 02/01/2022 | 03/01/2022 | 1 |
Using the following code I am able to flag when the department changes for a given case.
df['Dept_Change_Count'] = np.where((df['Department'] != df['Department'].shift(1)) & (df['ID'] == df['ID'].shift(1)), '1', '0')
I'm thinking I could use the df['Dept_Change_Count'] and a running sum along the ID to generate the output I'm after but I haven't had much luck so far.
Any help greatly appreciated!
CodePudding user response:
Compare previous and current row in Department
per ID
then again group
by ID
and calculate cumsum
to generate counter
m = df['Department'] != df.groupby('ID')['Department'].shift()
df['Dept_Change_Count'] = m.groupby(df['ID']).cumsum() - 1
Alternative approach using a single groupby
with lambda func to calculate cumsum:
df['Dept_Change_Count'] = df.groupby('ID')['Department']\
.apply(lambda s: (s != s.shift()).cumsum()) - 1
ID Department Start Date End Date Dept_Change_Count
0 A Sales 01/01/2022 02/01/2022 0
1 A Sales 02/01/2022 03/01/2022 0
2 A Operations 03/01/2022 04/01/2022 1
3 A Sales 04/01/2022 05/01/2022 2
4 B Finance 01/01/2022 02/01/2022 0
5 B Risk 02/01/2022 03/01/2022 1