I have such DataFrame:
date unique_id order_id
0 2022-09-20 111 NULL
1 2022-09-10 111 NULL
2 2022-08-10 111 2660a139
3 2022-08-08 111 NULL
4 2022-08-07 111 NULL
5 2022-08-04 222 NULL
6 2022-07-31 222 e61d1e7d
7 2022-07-20 222 NULL
Is it possible to clean data, so It would eliminate rows with null order_id`s after non-null value, based on unique_id? Outcome should look like this:
date unique_id order_id
0 2022-08-10 111 2660a139
1 2022-08-08 111 NULL
2 2022-08-07 111 NULL
3 2022-07-31 222 e61d1e7d
4 2022-07-20 222 NULL
Thanks!
CodePudding user response:
Use GroupBy.cummax
with Series.notna
:
df = df[df['order_id'].notna().groupby(df['unique_id']).cummax()]
#laternative
#df = df[df['order_id'].ne('NULL').groupby(df['unique_id']).cummax()]
print (df)
date unique_id order_id
2 2022-08-10 111 2660a139
3 2022-08-08 111 NaN
4 2022-08-07 111 NaN
6 2022-07-31 222 e61d1e7d
7 2022-07-20 222 NaN