I try to filter rows which are true for the following conditions:
dataset = dataset[~dataset.duplicated(subset=['NUM'])]
dataset = [~dataset['NUM'].isin(invalidatedNumsDb)]
dataset = [dataset['NUM'].apply(checkNumber)]
dataset = [dataset['AGE'].apply(isAgeOld)]
dataset = [dataset['EMAIL'].apply(isEmail)]
Data is:
NUM AGE EMAIL
1 18 [email protected]
2 N [email protected]
3 20 [email protected]
As reult I want to get the prev table with rows:
3 20 [email protected]
Because 20 is true for isAgeOld
[email protected] is true for isEmail
3 is true for checkNumber
and 3 is not presented in set invalidatedNumsDb
.
Problem is if [~dataset['NUM'].isin(invalidatedNumsDb)]
returns true it contains result:
[0 False
Name: NUM, dtype: bool]
Then the next rule is fails:
dataset = [dataset['NUM'].apply(checkNumber)]
invalidatedNumsDb = {1,2}
def checkNumber(v):
v < 100
def isAgeOld(v):
return v > 18
def isEmail(v):
return True
CodePudding user response:
Change your checkNumber
function to actually return
a value. Then apply your conditions to your dataset:
def checkNumber(v):
return v < 100
m1 = ~dataset.duplicated("NUM")
m2 = ~dataset['NUM'].isin(invalidatedNumsDb)
m3 = dataset['NUM'].apply(checkNumber)
m4 = pd.to_numeric(dataset['AGE'],errors='coerce').apply(isAgeOld)
m5 = dataset['EMAIL'].apply(isEmail)
output = dataset[m1&m2&m3&m4&m5]
>>> output
NUM AGE EMAIL
2 3 20 [email protected]