Apply multiple criteria to select current and prior row

I have a dataframe like as shown below

person_id  source_system   r_diff
  1              O          NULL
  1              O           0
  1              O           9
  1              O           NULL
  2              O           574
  2              I           20
  2              O           135
  2              O           0
  2              I           21
  2              O           2
  2              O           0
  2              O           0
  2              I           12

I would like to select rows based on the criteria below

criteria 1 - pick all rows where source-system = I

criteria 2 - pick prior row (n-1) only when source-system of (n-1)th is O and diff is zero.

This criteria 2 should be applied only when nth row has source-system = I. If source-system of (n-1)th is I, we don't have to do anything because criteria 1 will handle that.

We have to apply both the criteria each person

I tried the below based on SO suggestion but not sure how to make it work

m1 = df['visit_source_value'] == 'I'
m2 = df['diff'] <= 0
m3 = df.groupby('person_id')['diff'].shift(-1) <= 0 

df = df1[m1 | m2 | m3]

I expect my output to be like as shown below

  2              I           20
  2              O           0
  2              I           21
  2              O           0
  2              I           12

CodePudding user response：

I prefer not one line solution, because hard readable if more complicated code, so better is use:

m1 = df['visit_source_value'] == 'I'
m2 = df['r_diff'] <= 0
m3 = df.groupby('person_id')['visit_source_value'].shift(-1) == 'I'

df = df[m1 | (m2 & m3)]

print (df)
    person_id visit_source_value  r_diff
5           2                  I    20.0
7           2                  O     0.0
8           2                  I    21.0
11          2                  O     0.0
12          2                  I    12.0

CodePudding user response：

You could do it in one-line:

>>> df[df['source_system'].eq('I') | (df['source_system'].shift(-1).eq('I') & df['r_diff'].le(0))]
    person_id source_system  r_diff
5           2             I    20.0
7           2             O     0.0
8           2             I    21.0
11          2             O     0.0
12          2             I    12.0
>>>