Home > Enterprise >  (Python) Trying to filter a DataFrame based on Boolean indices/mask, result has a bunch of NaN
(Python) Trying to filter a DataFrame based on Boolean indices/mask, result has a bunch of NaN

Time:09-23

Say one has a DataFrame, df, such that

>df

        Col1    Col2    
0       6       0   
1       8       12  
2       10      4   
3       -5      6   

If you reference a column, that column will be filtered:

>df['Col1']

        Col1        
0       6           
1       8           
2       10      
3       -5      

If you then tack a Boolean expression onto the end, it'll evaluate for each row:

>df['Col1'] < 7

        Col1        
0       True            
1       False           
2       False       
3       True

If you then wrap that in brackets with the original DataFrame, it'll filter the original DataFrame:

>df[df['Col1'] < 7]

        Col1    Col2    
0       6       0       
3       -5      6    

All of this is expected behavior, at least it's what I expect.

However, I'm trying to filter a DataFrame on a string, and instead of filtering out results in the DataFrame, it's converting the whole DataFrame into NaN's for values that don't match and "True" for the items that do. What am I missing?

Edit: Added in actual code sample

> testing

    result  Out Zone A  Out Zone B  In Zone C
0   2.0822  In          Out         In
1   2.0871  In          Out         In
2   2.1077  In          In          Out
3   2.0998  In          In          Out
4   2.1278  Out         In          Out
5   2.0767  In          Out         In
6   2.0725  In          Out         In
7   2.1023  In          In          Out
8   2.1296  In          In          Out
9   2.1193  In          In          Out
10  2.1017  In          In          Out
11  2.1017  In          In          Out
12  2.0913  In          In          Out

> testing["Out Zone A"] == "Out"

    Out Zone A
0   False
1   False
2   False
3   False
4   True
5   False
6   False
7   False
8   False
9   False
10  False
11  False
12  False

> testing[testing["Out Zone A"] == "Out"]

    result  Out Zone A  Out Zone B  In Zone C
0   NaN     NaN         NaN         NaN
1   NaN     NaN         NaN         NaN
2   NaN     NaN         NaN         NaN
3   NaN     NaN         NaN         NaN
4   NaN     True        NaN         NaN
5   NaN     NaN         NaN         NaN
6   NaN     NaN         NaN         NaN
7   NaN     NaN         NaN         NaN
8   NaN     NaN         NaN         NaN
9   NaN     NaN         NaN         NaN
10  NaN     NaN         NaN         NaN
11  NaN     NaN         NaN         NaN
12  NaN     NaN         NaN         NaN

CodePudding user response:

Well, I think I might know the issue: I just checked the version of pandas my work is running... 0.23.0... which came out in 2018. I don't know that that's the issue, but I suspect that it is... fantastic...

CodePudding user response:

I have a newer version of pandas (1.2.5) and the following code works perfectly:

import pandas as pd

d = {'Col1' : ['in', 'out', 'in', 'out', 'in', 'out'],
     'Col2' : ['1', '2', '3', '4', '5', '6']}

df = pd.DataFrame(d)
df = df[df['Col1']=='in']

print(df.to_string())

Output:

  Col1 Col2
0   in    1
2   in    3
4   in    5
  • Related