I have a NUM
column, I try to filter rows where column NUM
is valid (true) and:
- Update current dataframe
- Insert count of wrong rows into dict
report
I try this:
report["NUM"] = dataset['NUM'].apply(~isValid).count()
So, it does not work for me. Dataframe is:
NUM AGE COUNTRY
1 18 USA
2 19 USA
3 30 AU
The isValid
it is a function
def isValid(value):
return True
Remark:
I use this rule:
report["NUM"] = (~dataset['NUM'].apply(checkNumber)).sum()
I get this error:
report["NUM"] = (~dataset['NUM'].apply(luhn)).sum()
C:\Users\Oleh\AppData\Local\Temp\ipykernel_17284\2678582562.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
CodePudding user response:
If you want to count the rows where isValid
outputs False:
(~dataset['NUM'].apply(isValid)).sum()
output: 0
edit
m = dataset['NUM'].apply(isValid)
report["NUM"] = (~m).sum()
dataset2 = dataset[m]
CodePudding user response:
def isValid(value):
return True
my_df = pd.DataFrame({'NUM':[1,2,3], 'AGE':[18,19,20], 'COUNTRY':['USA','USA','AU']})
report = {'wrong_rows':(~my_df.NUM.apply(isValid)).sum()}
CodePudding user response:
You need
dataset[dataset['NUM'].map(isValid) == False].count()
Because
dataset['NUM'].apply(~isValid)
is just wrong.
isValid is a function, ~isValid is like not isValid, which I'm guessing evaluates to False? I'm not sure.
Also
dataset[col].apply(func)
will return the whole dataset with the values returned by the function for each row. If you want to filter out the False ones you need the
df[df[col]==True]
syntax. If you had a new column say
df["valid"] = dataset[col].map(func)
You could then do
df.query("valid is False")
Or something of the sort