I am working on a panda data frame like the below:
Items Run Status
T1 R1 OK
T2 R1 OK
T3 R1 OK
T4 R1 OK
T1 R2 NOK
T2 R2 OK
T3 R2 NOK
T4 R2 OK
T1 R3 OK
T2 R3 NOK
T3 R3 OK
T4 R3 OK
I want to capture the state transition that occurred per Item(OK to NOK to OK and vice-versa) and also the number of times the state transition occurred like below:
Items Run Status Transition Count of Transition per Items
T1 R1 OK 0 1
T1 R2 NOK 1
T1 R3 OK 0
T2 R1 OK 0 1
T2 R2 OK 0
T2 R3 NOK 1
T3 R1 OK 0 2
T3 R2 NOK 1
T3 R3 OK 1
T4 R1 OK 0 0
T4 R2 OK 0
T4 R3 OK 0
CodePudding user response:
IIUC, you can use:
# sort values by Items and Run
df = df.sort_values(by=['Items', 'Run'])
# identify changes in Status
m = df['Status'].ne(df.groupby('Items')['Status'].shift().bfill())
# convert boolean to integer
df['Transition'] = m.astype(int)
# count per group masking the non-first values with empty string
df['Count'] = (m.groupby(df['Items']).transform('sum')
.mask(df['Items'].duplicated(), '') # optional
)
output:
Items Run Status Transition Count
0 T1 R1 OK 0 2
4 T1 R2 NOK 1
8 T1 R3 OK 1
1 T2 R1 OK 0 1
5 T2 R2 OK 0
9 T2 R3 NOK 1
2 T3 R1 OK 0 2
6 T3 R2 NOK 1
10 T3 R3 OK 1
3 T4 R1 OK 0 0
7 T4 R2 OK 0
11 T4 R3 OK 0