Home > front end >  Pandas - Filter based on multiple conditions
Pandas - Filter based on multiple conditions

Time:05-25

I am facing a problem when I need to exclude some rows from by dataframe.

here the code:

import numpy as np
import pandas as pd
dff = {
    'ID': [1, 2, 3, 4, 5, 6, 7],
    'Default': [1,1,0,1,0,1,0],
    'Default_Amount': [1200,2000,0,350,0,760,0],
    'Tot_Amount': [1200,2000,3400,350,10000,760,7500],
    'Time' : ['October','March','November','November','January','December','January'],
    'Class': ['A','B','A','A','B','B','A']
}
dff = pd.DataFrame(dff)
display(dff)

dff[(dff.Time != 'November') & (dff.Default == 1) ]

What I am trying to do, is to exclude the row in dataframe that has the following two conditions: time = November with default = 1 (that is the ID 4 in the dataframe).

But if I execute this code "dff[(dff.Time != 'November') & (dff.Default == 1) ]", it excludes also the other time = "November" and default = 0.

How can I avoid this problem?

Thanks in advance!

CodePudding user response:

You need match not equal dff.Default != 1 with bitwise OR by |:

df = dff[(dff.Time != 'November') | (dff.Default != 1) ]

Or invert mask, but change | to & for bitwise AND and change != to ==:

df = dff[~((dff.Time == 'November') & (dff.Default == 1)) ]

CodePudding user response:

This is simple boolean algebra

Either follow De Morgan's law and invert both conditions while getting an OR (|):

dff[(dff.Time != 'November') | (dff.Default != 1) ]

or add (&) your two conditions, and negate (~):

dff[~((dff.Time == 'November') & (dff.Default == 1)) ]

CodePudding user response:

Why don't you simply use

dff[dff.ID != 4]

or otherwise you can use

dff[~((dff.Time == 'November') & (dff.Default == 1))]

The problem with you code is that (dff.Time != 'November') first selects all non-November rows and then (dff.Default == 1) selects all row with Default '1' only.

  • Related