Pandas change column type from float to int with nulls-CodePudding

When I create empty dataframe, by default columns are created by float64 type. I tried to change types on "int" and object. With "object" turns out well. Why I could not change type on "int"?

import pandas as pd
data = pd.DataFrame()
header_list = ['surname','name','second_name', 'sex', 'age', 'phone', 'region']
data = data.reindex(columns = header_list)
    
data[['surname','name','second_name','sex','region']] = data[['surname','name','second_name','sex','region']].astype('object')
data[['age','phone']]= data[['age','phone']].astype('int')
print(data.info())

CodePudding user response：

This is because the np.nan or NaN (they are not exact same) values you see in the dataframe are of type float. It is a limitation that pitifully can't be avoided as long as you have NaN values in your code.

Kindly read more in pandas' documentation here.

Because NaN is a float, a column of integers with even one missing values is cast to floating-point dtype (see Support for integer NA for more). pandas provides a nullable integer array, which can be used by explicitly requesting the dtype:

The proposed solution in said documentation is:

pd.Series([1, 2, np.nan, 4], dtype=pd.Int64Dtype())

Which returns:

0       1
1       2
2    <NA>
3       4
dtype: Int64

CodePudding user response：

I tried your code in VS Code and got this output. Did not modify your code at all.

Data columns (total 7 columns):
#   Column       Non-Null Count  Dtype 
--  ------       --------------  ----- 
0   surname      0 non-null      object
1   name         0 non-null      object
2   second_name  0 non-null      object
3   sex          0 non-null      object
4   age          0 non-null      int32 
5   phone        0 non-null      int32 
6   region       0 non-null      object