#create series from dictionary using pandas
data_dict={'Ahmed':90,'Ali':85,'Omar':80}
series=pd.Series(data_dict,index=['Ahmed','Ali','Omar'])
print("Series :",series)
series2=pd.Series(data_dict,index=['Ahmed','Ali','Omar','Karthi'])
print("Series 2 :",series2)
I tried this code while practising pandas, I received the output as below:
Series :
Ahmed 90
Ali 85
Omar 80
dtype: int64
Series 2 :
Ahmed 90.0
Ali 85.0
Omar 80.0
Karthi NaN
dtype: float64
Question: Why the data type got changed in the Series 2 from int to float?
I just tried to know what will be the output if i add an extra field in the index which is not belong to dictionary.I got NaN, but datatype got changed from int to float.
CodePudding user response:
When providing a dictionary to pandas.Series
, the keys are used as index, and the values as data.
In fact you only need:
series = pd.Series(data_dict)
print(series)
Ahmed 90
Ali 85
Omar 80
dtype: int64
If you use a list as source of the data, then index
is useful:
series = pd.Series([90, 85, 80], index=['Ahmed','Ali','Omar'])
print(series)
Ahmed 90
Ali 85
Omar 80
dtype: int64
When you provide both, this acts as a reindex
:
series = pd.Series(data_dict, index=['Ahmed','Ali','Omar','Karthi'])
# equivalent to
series = pd.Series(data_dict).reindex(['Ahmed','Ali','Omar','Karthi'])
print(series)
Ahmed 90.0
Ali 85.0
Omar 80.0
Karthi NaN
dtype: float64
In this case, missing indices are filled with NaN
as default value, which forces the float64 type.
You can prevent the change by using the Int64
dtype that supports an integer NA:
series = pd.Series(data_dict, index=['Ahmed','Ali','Omar','Karthi'], dtype='Int64')
print(series)
Output:
Ahmed 90
Ali 85
Omar 80
Karthi <NA>
dtype: Int64
CodePudding user response:
NaN
is considered a special floating point value (IEE 754). There is no value for Karthi
in series2
, so it gets automatically filled in with NaN
. Try converting one of the integers into np.NaN
and you will see the same behavior. A series that contains a floating point will be automatically cast as a floating point.
import pandas as pd
import numpy as np
data_dict = {'Ahmed':90, 'Ali':85, 'Omar':np.NaN}
series = pd.Series(data_dict, index=['Ahmed','Ali','Omar'])
Output:
Ahmed 90.0
Ali 85.0
Omar NaN
dtype: float64