I've encountered this strange behavior of pandas .astype() (I'm using version 1.5.2). When trying to cast a column as integer, and later requesting dtypes, all seems fine. Until you try to extract the values by row, when you get inconsistent types.
Code:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 3))
df.loc[:, 0] = df.loc[:, 0].astype(int)
print(df)
print(df.dtypes)
print(df.iloc[0, :])
print(type(df.values[0, 0]))
Out:
0 1 2
0 0 -0.232432 1.025643
1 -1 0.556968 -0.729378
2 -1 1.285546 -0.541676
0 int64
1 float64
2 float64
dtype: object
0 0.000000
1 -0.232432
2 1.025643
Name: 0, dtype: float64
<class 'numpy.float64'>
Any guess of what I'm doing wrong here?
Tried to call without loc as
df[0] = df[0].astype(int)
dind't work either
CodePudding user response:
I think this is due to the usage of df.values
because it will try to return a Numpy representation of the DataFrame. As per the docs
By default, the
dtype
of the returned array will be the common NumPydtype
of all types in the DataFrame.
>>> from pandas.core.dtypes.cast import find_common_type
>>> find_common_type(df.dtypes.to_list()) # df is your dataframe
dtype('float64')