Why is this working?
import pandas as pd
numbers = {'mynumbers': [51, 52, 53, 54, 55]}
df = pd.DataFrame(numbers, columns =['mynumbers'])
df.loc[df['mynumbers'] <= 53, 'mynumbers'] = 'True'
print (df)
Output:
mynumbers
0 True
1 True
2 True
3 False
4 False
But this returns an error:
import pandas as pd
numbers = {'mynumbers': [51, 52, 53, 54, 55]}
df = pd.DataFrame(numbers, columns =['mynumbers'])
print(df.loc[df['mynumbers']])
If in the first case I can use the "df.loc[df['mynumbers']]" statement as a conditional to compare values, why do I get an error when I simply try to print out the statement alone?
I understand that the index values that I pass into the .loc method yield a key error because there is no such key exist, but I do not understand that why does it works in the first instance?
CodePudding user response:
When you do df['mynumbers'] <= 53
you use a boolean indexer, that is a series that has the same index as df
and either True
or False
as values:
>>> df['mynumbers'] <= 53
0 True
1 True
2 True
3 False
4 False
Name: mynumbers, dtype: bool
This can be passed to df.loc[]
or df[]
:
>>> df[df['mynumbers'] <= 53]
mynumbers
0 51
1 52
2 53
>>> df.loc[df['mynumbers'] <= 53, :]
mynumbers
0 51
1 52
2 53
The other way to use df.loc[]
is to pass in index values:
>>> df.loc[df.index]
mynumbers
0 51
1 52
2 53
3 54
4 55
>>> df.loc[df.index[3:]]
mynumbers
3 54
4 55
>>> df.loc[[1, 2]]
mynumbers
1 52
2 53
However when you do df.loc[df['mynumbers']]
you’re doing none of those 2 options. It’s trying to find the object df['mynumbers']
in the index, as shown by the following error, and that doesn’t work:
KeyError: "None of [Int64Index([51, 52, 53, 54, 55], dtype='int64')] are in the [index]"