Home > Blockchain >  Python ignore DataFrame rows after non-null value
Python ignore DataFrame rows after non-null value

Time:10-21

I have such DataFrame:

         date  unique_id  order_id
0  2022-09-20        111      NULL
1  2022-09-10        111      NULL
2  2022-08-10        111  2660a139
3  2022-08-08        111      NULL
4  2022-08-07        111      NULL
5  2022-08-04        222      NULL
6  2022-07-31        222  e61d1e7d
7  2022-07-20        222      NULL

Is it possible to clean data, so It would eliminate rows with null order_id`s after non-null value, based on unique_id? Outcome should look like this:

         date  unique_id  order_id
0  2022-08-10        111  2660a139
1  2022-08-08        111      NULL
2  2022-08-07        111      NULL
3  2022-07-31        222  e61d1e7d
4  2022-07-20        222      NULL

Thanks!

CodePudding user response:

Use GroupBy.cummax with Series.notna:

df = df[df['order_id'].notna().groupby(df['unique_id']).cummax()]
#laternative
#df = df[df['order_id'].ne('NULL').groupby(df['unique_id']).cummax()]
print (df)
         date  unique_id  order_id
2  2022-08-10        111  2660a139
3  2022-08-08        111       NaN
4  2022-08-07        111       NaN
6  2022-07-31        222  e61d1e7d
7  2022-07-20        222       NaN
  • Related