The idea here is to keep all values except those between 4900-4999 and 6000-6999. However, this code does not work at all. It seems to work if I break it up into two lines. Now searching for the correct syntax.
crsp = crsp[~(crsp['SICCD'].between(4900, 4999)) | ~(crsp['SICCD'].between(6000, 6999))]
CodePudding user response:
Your code is basically correct, but you need to use &
, not |
, as one of the two conditions crsp['SICCD'].between(4900, 4999))
and crsp['SICCD'].between(6000, 6999)
will always be false (hence negated, one will always be true). For example:
test = random.choices(range(4000,8000), k=20)
df = pd.DataFrame(test, columns=['Data'])
Sample data:
Data
0 6113
1 4681
2 6891
3 4991
4 6576
5 5087
6 6111
7 5364
8 6658
9 4072
10 4327
11 5517
12 5421
13 6814
14 7099
15 6058
16 4404
17 6397
18 4851
19 6606
Now filter:
df = df[~(df['Data'].between(4900,4999)) & ~(df['Data'].between(6000,6999))]
Output:
Data
1 4681
5 5087
7 5364
9 4072
10 4327
11 5517
12 5421
14 7099
16 4404
18 4851
CodePudding user response:
I was able to accomplish this by using a np.where()
test = np.arange(3000, 8000)
df = pd.DataFrame(test, columns = ['Data'])
df['Check'] = np.where((df['Data'].between(4900, 4999)) | (df['Data'].between(6000, 6999)), True, False)
df.loc[df['Check'] == False]