Say one has a DataFrame, df, such that
>df
Col1 Col2
0 6 0
1 8 12
2 10 4
3 -5 6
If you reference a column, that column will be filtered:
>df['Col1']
Col1
0 6
1 8
2 10
3 -5
If you then tack a Boolean expression onto the end, it'll evaluate for each row:
>df['Col1'] < 7
Col1
0 True
1 False
2 False
3 True
If you then wrap that in brackets with the original DataFrame, it'll filter the original DataFrame:
>df[df['Col1'] < 7]
Col1 Col2
0 6 0
3 -5 6
All of this is expected behavior, at least it's what I expect.
However, I'm trying to filter a DataFrame on a string, and instead of filtering out results in the DataFrame, it's converting the whole DataFrame into NaN's for values that don't match and "True" for the items that do. What am I missing?
Edit: Added in actual code sample
> testing
result Out Zone A Out Zone B In Zone C
0 2.0822 In Out In
1 2.0871 In Out In
2 2.1077 In In Out
3 2.0998 In In Out
4 2.1278 Out In Out
5 2.0767 In Out In
6 2.0725 In Out In
7 2.1023 In In Out
8 2.1296 In In Out
9 2.1193 In In Out
10 2.1017 In In Out
11 2.1017 In In Out
12 2.0913 In In Out
> testing["Out Zone A"] == "Out"
Out Zone A
0 False
1 False
2 False
3 False
4 True
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
> testing[testing["Out Zone A"] == "Out"]
result Out Zone A Out Zone B In Zone C
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN True NaN NaN
5 NaN NaN NaN NaN
6 NaN NaN NaN NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN NaN
9 NaN NaN NaN NaN
10 NaN NaN NaN NaN
11 NaN NaN NaN NaN
12 NaN NaN NaN NaN
CodePudding user response:
Well, I think I might know the issue: I just checked the version of pandas my work is running... 0.23.0... which came out in 2018. I don't know that that's the issue, but I suspect that it is... fantastic...
CodePudding user response:
I have a newer version of pandas (1.2.5) and the following code works perfectly:
import pandas as pd
d = {'Col1' : ['in', 'out', 'in', 'out', 'in', 'out'],
'Col2' : ['1', '2', '3', '4', '5', '6']}
df = pd.DataFrame(d)
df = df[df['Col1']=='in']
print(df.to_string())
Output:
Col1 Col2
0 in 1
2 in 3
4 in 5