Home > Net >  series from dictionary using pandas
series from dictionary using pandas

Time:02-01

#create series from dictionary using pandas

data_dict={'Ahmed':90,'Ali':85,'Omar':80}
series=pd.Series(data_dict,index=['Ahmed','Ali','Omar'])
print("Series :",series)
series2=pd.Series(data_dict,index=['Ahmed','Ali','Omar','Karthi'])
print("Series 2 :",series2)

I tried this code while practising pandas, I received the output as below:

Series :
Ahmed    90
Ali      85
Omar     80
dtype: int64

Series 2 :
Ahmed     90.0
Ali       85.0
Omar      80.0
Karthi     NaN
dtype: float64

Question: Why the data type got changed in the Series 2 from int to float?

I just tried to know what will be the output if i add an extra field in the index which is not belong to dictionary.I got NaN, but datatype got changed from int to float.

CodePudding user response:

When providing a dictionary to pandas.Series, the keys are used as index, and the values as data.

In fact you only need:

series = pd.Series(data_dict)

print(series)
Ahmed    90
Ali      85
Omar     80
dtype: int64

If you use a list as source of the data, then index is useful:

series = pd.Series([90, 85, 80], index=['Ahmed','Ali','Omar'])

print(series)
Ahmed    90
Ali      85
Omar     80
dtype: int64

When you provide both, this acts as a reindex:

series = pd.Series(data_dict, index=['Ahmed','Ali','Omar','Karthi'])

# equivalent to
series = pd.Series(data_dict).reindex(['Ahmed','Ali','Omar','Karthi'])

print(series)
Ahmed     90.0
Ali       85.0
Omar      80.0
Karthi     NaN
dtype: float64

In this case, missing indices are filled with NaN as default value, which forces the float64 type.

You can prevent the change by using the Int64 dtype that supports an integer NA:

series = pd.Series(data_dict, index=['Ahmed','Ali','Omar','Karthi'], dtype='Int64')
print(series)

Output:

Ahmed       90
Ali         85
Omar        80
Karthi    <NA>
dtype: Int64

CodePudding user response:

NaN is considered a special floating point value (IEE 754). There is no value for Karthi in series2, so it gets automatically filled in with NaN. Try converting one of the integers into np.NaN and you will see the same behavior. A series that contains a floating point will be automatically cast as a floating point.

import pandas as pd
import numpy as np

data_dict = {'Ahmed':90, 'Ali':85, 'Omar':np.NaN} 

series = pd.Series(data_dict, index=['Ahmed','Ali','Omar'])

Output:

Ahmed    90.0
Ali      85.0
Omar      NaN
dtype: float64
  • Related