When I create empty dataframe, by default columns are created by float64 type. I tried to change types on "int" and object. With "object" turns out well. Why I could not change type on "int"?
import pandas as pd
data = pd.DataFrame()
header_list = ['surname','name','second_name', 'sex', 'age', 'phone', 'region']
data = data.reindex(columns = header_list)
data[['surname','name','second_name','sex','region']] = data[['surname','name','second_name','sex','region']].astype('object')
data[['age','phone']]= data[['age','phone']].astype('int')
print(data.info())
CodePudding user response:
This is because the np.nan
or NaN (they are not exact same) values you see in the dataframe are of type float. It is a limitation that pitifully can't be avoided as long as you have NaN values in your code.
Kindly read more in pandas' documentation here.
Because NaN is a float, a column of integers with even one missing values is cast to floating-point dtype (see Support for integer NA for more). pandas provides a nullable integer array, which can be used by explicitly requesting the dtype:
The proposed solution in said documentation is:
pd.Series([1, 2, np.nan, 4], dtype=pd.Int64Dtype())
Which returns:
0 1
1 2
2 <NA>
3 4
dtype: Int64
CodePudding user response:
I tried your code in VS Code and got this output. Did not modify your code at all.
Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 surname 0 non-null object
1 name 0 non-null object
2 second_name 0 non-null object
3 sex 0 non-null object
4 age 0 non-null int32
5 phone 0 non-null int32
6 region 0 non-null object