First, I should say I have gone through probable solutions for this kind of problem but couldn't find a close match. My data frame looks like below
Type Run Status Message
P1 R1 OK
P1 R2 NOK Unable to connect
P1 R3 OK
P1 R4 NOK Unable to fetch
P2 R1 OK
P2 R2 OK
P2 R3 NOK Entry not present
P2 R4 NOK Entry not present
I want to concatenate all the unique error messages per 'Type' == NOK to a new column
Type Run Status Message Error
P1 R1 OK
P1 R2 NOK Unable to connect Unable to connect, Unable to fetch
P1 R3 OK
P1 R4 NOK Unable to fetch
P2 R1 OK
P2 R2 OK
P2 R3 NOK Entry not present Entry not present
P2 R4 NOK Entry not present
Any leads will be helpful
CodePudding user response:
You can use groupby
operations:
# identify NOK rows
m = df['Status'].eq('NOK')
# get index of first NOK per group
idx = m[m].groupby(df['Type']).cumcount().loc[lambda x: x==0].index
# initialize Error to empty string (optional)
df['Error'] = ''
# concatenate unique error messages per group
# and assign to first NOK per group
df.loc[idx, 'Error'] = (df.groupby('Type')['Message']
.agg(lambda g: ', '.join(dict.fromkeys(g.replace('', pd.NA).dropna())))
.tolist()
)
output:
Type Run Status Message Error
0 P1 R1 OK None
1 P1 R2 NOK Unable to connect Unable to connect, Unable to fetch
2 P1 R3 OK None
3 P1 R4 NOK Unable to fetch
4 P2 R1 OK None
5 P2 R2 OK None
6 P2 R3 NOK Entry not present Entry not present
7 P2 R4 NOK Entry not present