Home > OS >  Combining ~ | and between in pandas filter
Combining ~ | and between in pandas filter

Time:05-29

The idea here is to keep all values except those between 4900-4999 and 6000-6999. However, this code does not work at all. It seems to work if I break it up into two lines. Now searching for the correct syntax.

crsp = crsp[~(crsp['SICCD'].between(4900, 4999)) | ~(crsp['SICCD'].between(6000, 6999))] 

CodePudding user response:

Your code is basically correct, but you need to use &, not |, as one of the two conditions crsp['SICCD'].between(4900, 4999)) and crsp['SICCD'].between(6000, 6999) will always be false (hence negated, one will always be true). For example:

test = random.choices(range(4000,8000), k=20)
df = pd.DataFrame(test, columns=['Data'])

Sample data:

    Data
0   6113
1   4681
2   6891
3   4991
4   6576
5   5087
6   6111
7   5364
8   6658
9   4072
10  4327
11  5517
12  5421
13  6814
14  7099
15  6058
16  4404
17  6397
18  4851
19  6606

Now filter:

df = df[~(df['Data'].between(4900,4999)) & ~(df['Data'].between(6000,6999))]

Output:

    Data
1   4681
5   5087
7   5364
9   4072
10  4327
11  5517
12  5421
14  7099
16  4404
18  4851

CodePudding user response:

I was able to accomplish this by using a np.where()

test = np.arange(3000, 8000)
df = pd.DataFrame(test, columns = ['Data'])
df['Check'] = np.where((df['Data'].between(4900, 4999)) | (df['Data'].between(6000, 6999)), True, False)
df.loc[df['Check'] == False]
  • Related