Home > Mobile >  Filtering with conditional conditions (pandas)
Filtering with conditional conditions (pandas)

Time:03-04

I have an array with multiple strings, some of them are none(0 or ''), and each of them should have their own conditions if exists. if the array at its position is none, I don't have to apply the filtering.

# df.columns = ['a','b','c','d','e']

# Case 1 
l = ['A', 'B', '','' , 123]
## DESIRED FILTERING
df[ (df.a=='A') & (df.b=='B') & (df.e == 123)]

# Case 2
l = ['z','' ,'' ,'', 123]
## DESIRED FILTERING
df[ (df.a=='z') & (df.e == 123) ]

This is my attempt, yet it failed cuz (df.col_name == 'something') returns a series.

#Case 1 for example

check_null = [ i!='' for i in l ]       # ->returns [true,false,...]
conditions = [ (df.a==l[0]),(df.b==l[1]),(df.c==l[2]), (df.d==l[3]), (df.e==l[4])]
filt = [conditions[i] for i in range(len(check_null)) if check_null[i]]

df[filt]

How do I manage to get this work?

CodePudding user response:

Create dictionary for non empty values, convert to Series and filtering in boolean indexing:

df = pd.DataFrame(columns = ['a','b','c','d','e'])
df.loc[0] = ['A', 'B','g' ,'h' , 123]
df.loc[1] = ['A', 'B','g' ,'h' , 52]

l = ['A', 'B','' ,'' , 123]

s = pd.Series(dict(zip(df.columns, l))).loc[lambda x: x != '']

df  = df[df[s.index].eq(s).all(axis=1)]
print (df)
   a  b  c  d    e
0  A  B  g  h  123

l = ['A', 'B', '','', '']
s = pd.Series(dict(zip(df.columns, l))).loc[lambda x: x != '']

df  = df[df[s.index].eq(s).all(axis=1)]
print (df)
   a  b  c  d    e
0  A  B  g  h  123
1  A  B  g  h   52

CodePudding user response:

You could create a list of boolean Serieses and use np.all on axis to mimic the & operator:

l = ['A', 'B', None, None, 123]
out = df[np.all([df[col].eq(i) for col, i in zip(df.columns, l) if i], axis=0)]

CodePudding user response:

You can use a Series for comparison.

Either ensure that the value matches the Series (df.eq(s)), or (|) that the Series contains a empty string (s.eq('')). Broadcasting magic will do the rest ;)

s = pd.Series(l, index=df.columns)

df2 = df[(df.eq(s)|s.eq('')).all(1)]

Example with ['A', 'B', '', '', 123]:

# input
   a  b  c  d    e
0  A  B  C  D  123
1  A  X  C  D  456

# output
   a  b  c  d    e
0  A  B  C  D  123
  • Related