State transion count in Pandas considering single instance-CodePudding

I was working with a data frame as below:

Items    Run    Status
T1       R1      OK
T2       R1      OK
T3       R1      OK
T4       R1      OK
T1       R2      NOK
T2       R2      OK
T3       R2      NOK
T4       R2      OK
T1       R3      OK
T2       R3      NOK
T3       R3      OK
T4       R3      OK

I had to capture the state transition that occurred per Item(OK to NOK to OK and vice-versa) and also the number of times the state transition occurred. With the support of this forum, below code provided the result

# sort values by Items and Run
df = df.sort_values(by=['Items', 'Run'])

# identify changes in Status
m = df['Status'].ne(df.groupby('Items')['Status'].shift().bfill())

# convert boolean to integer
df['Transition'] = m.astype(int)

# count per group masking the non-first values with empty string
df['Count'] = (m.groupby(df['Items']).transform('sum')
            .mask(df['Items'].duplicated(), '') # optional
           )

Output:

   Items Run Status  Transition Count
0     T1  R1     OK           0     2
4     T1  R2    NOK           1      
8     T1  R3     OK           1      
1     T2  R1     OK           0     1
5     T2  R2     OK           0      
9     T2  R3    NOK           1      
2     T3  R1     OK           0     2
6     T3  R2    NOK           1      
10    T3  R3     OK           1      
3     T4  R1     OK           0     0
7     T4  R2     OK           0      
11    T4  R3     OK           0

However, if in my data frame a single run of an item is present, then the above code also provides the transition count as 1 for that item like below, item T5:

Items Run Status  Transition Count
    0     T1  R1     OK           0     2
    4     T1  R2    NOK           1      
    8     T1  R3     OK           1      
    1     T2  R1     OK           0     1
    5     T2  R2     OK           0      
    9     T2  R3    NOK           1      
    2     T3  R1     OK           0     2
    6     T3  R2    NOK           1      
    10    T3  R3     OK           1      
    3     T4  R1     OK           0     0
    7     T4  R2     OK           0      
    11    T4  R3     OK           0
    XX    T5  R1     OK           1      1

The expectation is for any entry with a single run, Transition and Count should be zero.

CodePudding user response：

My bad, this was my code. You can use:

df['Status'].ne(df.groupby('Items')['Status'].shift().fillna(df['Status']))

in place of

df['Status'].ne(df.groupby('Items')['Status'].shift().bfill())

# sort values by Items and Run
df = df.sort_values(by=['Items', 'Run'])

# identify changes in Status
m = df['Status'].ne(df.groupby('Items')['Status'].shift().fillna(df['Status']))

# convert boolean to integer
df['Transition'] = m.astype(int)

# count per group masking the non-first values with empty string
df['Count'] = (m.groupby(df['Items']).transform('sum')
                .mask(df['Items'].duplicated(), '') # optional
               )

df

output:

   Items Run Status  Transition Count
0     T1  R1     OK           0     2
4     T1  R2    NOK           1      
8     T1  R3     OK           1      
1     T2  R1     OK           0     1
5     T2  R2     OK           0      
9     T2  R3    NOK           1      
2     T3  R1     OK           0     2
6     T3  R2    NOK           1      
10    T3  R3     OK           1      
3     T4  R1     OK           0     0
7     T4  R2     OK           0      
11    T4  R3     OK           0      
XX    T5  R1     OK           0     0

CodePudding user response：

Try this:

df = df.sort_values(by=['Items', 'Run'])
shifted = df.shift()

df["Transition"] = df["Items"].eq(shifted["Items"]) & df["Status"].ne(shifted["Status"])
df["Count"] = df.groupby("Items")["Transition"].transform("sum")