I just came across this strange behaviour of pd.DataFrame.select_dtypes
.
My pd.DataFrame
is:
df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': ['a', 'b', 'c', 'd'], 'c': [1.2, 3.4, 5.6, 7.8]})
Now if I want to select the numeric columns, I would do:
df.select_dtypes([int, float])
But the the output only contains the float
column:
c
0 1.2
1 3.4
2 5.6
3 7.8
Why is that? I listed both float
and int
, why doesn't it list the integer column.
Here are the dtypes
:
>>> df.dtypes
a int64
b object
c float64
dtype: object
>>>
As you can see, they're both end with 64
, but only float
works.
More tests:
>>> df.select_dtypes(int)
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]
>>> df.select_dtypes(float)
c
0 1.2
1 3.4
2 5.6
3 7.8
>>>
Why does this happen?
I know I could just do:
df.select_dtypes(['int64', 'float64'])
But I want to know the reason for this behavior.
CodePudding user response:
If need all integers and all float columns check numpy types
:
It means int16
, int32
, int64
match integer
, same principe for floats:
print (df.select_dtypes(['integer', 'floating']))
a c
0 1 1.2
1 2 3.4
2 3 5.6
3 4 7.8
Reason: Found numpy types
:
Warning
The int_ type does not inherit from the int built-in under Python 3, because type int is no longer a fixed-width integer type.