I was working with a data frame as below:
Items Run Status
T1 R1 OK
T2 R1 OK
T3 R1 OK
T4 R1 OK
T1 R2 NOK
T2 R2 OK
T3 R2 NOK
T4 R2 OK
T1 R3 OK
T2 R3 NOK
T3 R3 OK
T4 R3 OK
I had to capture the state transition that occurred per Item(OK to NOK to OK and vice-versa) and also the number of times the state transition occurred. With the support of this forum, below code provided the result
# sort values by Items and Run
df = df.sort_values(by=['Items', 'Run'])
# identify changes in Status
m = df['Status'].ne(df.groupby('Items')['Status'].shift().bfill())
# convert boolean to integer
df['Transition'] = m.astype(int)
# count per group masking the non-first values with empty string
df['Count'] = (m.groupby(df['Items']).transform('sum')
.mask(df['Items'].duplicated(), '') # optional
)
Output:
Items Run Status Transition Count
0 T1 R1 OK 0 2
4 T1 R2 NOK 1
8 T1 R3 OK 1
1 T2 R1 OK 0 1
5 T2 R2 OK 0
9 T2 R3 NOK 1
2 T3 R1 OK 0 2
6 T3 R2 NOK 1
10 T3 R3 OK 1
3 T4 R1 OK 0 0
7 T4 R2 OK 0
11 T4 R3 OK 0
However, if in my data frame a single run of an item is present, then the above code also provides the transition count as 1 for that item like below, item T5:
Items Run Status Transition Count
0 T1 R1 OK 0 2
4 T1 R2 NOK 1
8 T1 R3 OK 1
1 T2 R1 OK 0 1
5 T2 R2 OK 0
9 T2 R3 NOK 1
2 T3 R1 OK 0 2
6 T3 R2 NOK 1
10 T3 R3 OK 1
3 T4 R1 OK 0 0
7 T4 R2 OK 0
11 T4 R3 OK 0
XX T5 R1 OK 1 1
The expectation is for any entry with a single run, Transition and Count should be zero.
CodePudding user response:
My bad, this was my code. You can use:
df['Status'].ne(df.groupby('Items')['Status'].shift().fillna(df['Status']))
in place of
df['Status'].ne(df.groupby('Items')['Status'].shift().bfill())
# sort values by Items and Run
df = df.sort_values(by=['Items', 'Run'])
# identify changes in Status
m = df['Status'].ne(df.groupby('Items')['Status'].shift().fillna(df['Status']))
# convert boolean to integer
df['Transition'] = m.astype(int)
# count per group masking the non-first values with empty string
df['Count'] = (m.groupby(df['Items']).transform('sum')
.mask(df['Items'].duplicated(), '') # optional
)
df
output:
Items Run Status Transition Count
0 T1 R1 OK 0 2
4 T1 R2 NOK 1
8 T1 R3 OK 1
1 T2 R1 OK 0 1
5 T2 R2 OK 0
9 T2 R3 NOK 1
2 T3 R1 OK 0 2
6 T3 R2 NOK 1
10 T3 R3 OK 1
3 T4 R1 OK 0 0
7 T4 R2 OK 0
11 T4 R3 OK 0
XX T5 R1 OK 0 0
CodePudding user response:
Try this:
df = df.sort_values(by=['Items', 'Run'])
shifted = df.shift()
df["Transition"] = df["Items"].eq(shifted["Items"]) & df["Status"].ne(shifted["Status"])
df["Count"] = df.groupby("Items")["Transition"].transform("sum")