Home > database >  How to count rows where condition is false Pandas?
How to count rows where condition is false Pandas?

Time:05-25

I have a NUM column, I try to filter rows where column NUM is valid (true) and:

  1. Update current dataframe
  2. Insert count of wrong rows into dict report

I try this:

report["NUM"] =  dataset['NUM'].apply(~isValid).count()

So, it does not work for me. Dataframe is:

NUM AGE COUNTRY
1   18  USA
2   19  USA
3   30  AU

The isValid it is a function

def isValid(value):
   return True

Remark:

I use this rule:

report["NUM"] =  (~dataset['NUM'].apply(checkNumber)).sum()

I get this error:

  report["NUM"] =  (~dataset['NUM'].apply(luhn)).sum()
C:\Users\Oleh\AppData\Local\Temp\ipykernel_17284\2678582562.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

CodePudding user response:

If you want to count the rows where isValid outputs False:

(~dataset['NUM'].apply(isValid)).sum()

output: 0

edit

m = dataset['NUM'].apply(isValid)
report["NUM"] = (~m).sum()
dataset2 = dataset[m]

CodePudding user response:

def isValid(value):
   return True

my_df = pd.DataFrame({'NUM':[1,2,3], 'AGE':[18,19,20], 'COUNTRY':['USA','USA','AU']})

report = {'wrong_rows':(~my_df.NUM.apply(isValid)).sum()}

CodePudding user response:

You need

dataset[dataset['NUM'].map(isValid) == False].count()

Because

dataset['NUM'].apply(~isValid) 

is just wrong.

isValid is a function, ~isValid is like not isValid, which I'm guessing evaluates to False? I'm not sure.

Also

dataset[col].apply(func) 

will return the whole dataset with the values returned by the function for each row. If you want to filter out the False ones you need the

 df[df[col]==True] 

syntax. If you had a new column say

df["valid"] =  dataset[col].map(func)

You could then do

df.query("valid is False")

Or something of the sort

  • Related