Home > database >  How to filter table Pandas by some rules?
How to filter table Pandas by some rules?

Time:05-25

I try to filter rows which are true for the following conditions:

dataset = dataset[~dataset.duplicated(subset=['NUM'])]
dataset = [~dataset['NUM'].isin(invalidatedNumsDb)]
dataset = [dataset['NUM'].apply(checkNumber)]
dataset = [dataset['AGE'].apply(isAgeOld)]
dataset = [dataset['EMAIL'].apply(isEmail)]

Data is:

NUM AGE EMAIL
1   18  [email protected]
2   N   [email protected]
3   20   [email protected]

As reult I want to get the prev table with rows:

3   20   [email protected]

Because 20 is true for isAgeOld [email protected] is true for isEmail 3 is true for checkNumber and 3 is not presented in set invalidatedNumsDb.

Problem is if [~dataset['NUM'].isin(invalidatedNumsDb)] returns true it contains result:

[0    False
 Name: NUM, dtype: bool]

Then the next rule is fails:

dataset = [dataset['NUM'].apply(checkNumber)]

invalidatedNumsDb = {1,2}

def checkNumber(v):
  v < 100

def isAgeOld(v):
   return v > 18

def isEmail(v):
   return True

CodePudding user response:

Change your checkNumber function to actually return a value. Then apply your conditions to your dataset:

def checkNumber(v):
  return v < 100

m1 = ~dataset.duplicated("NUM")
m2 = ~dataset['NUM'].isin(invalidatedNumsDb)
m3 = dataset['NUM'].apply(checkNumber)
m4 = pd.to_numeric(dataset['AGE'],errors='coerce').apply(isAgeOld)
m5 = dataset['EMAIL'].apply(isEmail)

output = dataset[m1&m2&m3&m4&m5]

>>> output 
   NUM AGE           EMAIL
2    3  20  [email protected]
  • Related