Home > Blockchain >  panda df: Subset frame by condition in 2 columns
panda df: Subset frame by condition in 2 columns

Time:02-16

I want to subset my df based on the condition of 2 columns: Date and Person

Condition:
Date must be NON-BLANK except when Person =='Peter'

Tried the code but doesn't work, it removes ALL rows with blank in Date

df= df[ (df[df.columns[1]]!='peter') & (pd.isnull (df[df.columns[0]])!=True)]

INPUT


Date        |  Person  | 
            | tom      |
            | peter    |
            | jack     |
2010-10-4   | harry    |
2010-12-4   | sarah  |

DESIRED OUTPUT


Date        |  Person  | 
            | peter    |
2010-10-4   | harry    |
2010-12-4   | sarah  |

CodePudding user response:

Chain condition by | for bitwise OR by columns names:

df = df[df['Person'].eq('peter') | df['Date'].notna()]

Or by positions - selected columns by DataFrame.iloc:

df = df[df.iloc[:, 1].eq('peter') | df.iloc[:, 0].notna()]
print (df)
        Date Person
1        NaN  peter
3  2010-10-4  harry
4  2010-12-4  sarah

CodePudding user response:

You need to use | (boolean OR):

  # keep if peter               OR    if the data is NOT null
df[df[df.columns[1]].eq('peter') | ~df[df.columns[0]].isnull()]
  • Related