Values turn to Nan when indexing the keys in a dictionary in Pandas-CodePudding

i am trying to become a self taught data analyst. In Pandas when i index different names in the second part of the code, the values turn from 450 to Nan, from 500 to Nan and 380 becomes 380.0 (float). Also, the dtype turns from int64 to float64. Any ideas why this happens? Also if i copy an example from w3schools is it displayed fine.

import numpy as np
import pandas as pd


calories= {"Day 1": 450, "Day 2": 500, "day 3": 380}
new_series= pd.Series(calories)
print(new_series)

**#Second part of code**
new_series_1= pd.Series(calories, index=["day 1", "day 2", "day 3"])
print(new_series_1)

CodePudding user response：

I tried out your code. It is a simple fix. Python, like a lot of programs is case sensitive. You just need to revise your statement.

Change from:

new_series_1= pd.Series(calories, index=["day 1", "day 2", "day 3"])

to:

new_series_1= pd.Series(calories, index=["Day 1", "Day 2", "day 3"])

Note the capital letters.

When I made sure that the column names matched, I got similar output.

Day 1    450
Day 2    500
day 3    380
dtype: int64
Day 1    450
Day 2    500
day 3    380
dtype: int64

Hope that helps.

Regards.

CodePudding user response：

tl;dr

In new_series_1, calories keys don't match with the index values, and the Series is being reindexed with the latter, hence the NaN and float64.

Explanation

First you initialize new_series with calories, which is a dict with int values:

calories= {"Day 1": 450, "Day 2": 500, "day 3": 380}
new_series= pd.Series(calories)

So Pandas knows they can be treated best as int64.

Then you set 2 different values in index, day 1 and day 2, no capitalized:

new_series_1= pd.Series(calories, index=["day 1", "day 2", "day 3"])

There was no more correspondence between calories's keys and index values, so Pandas defaulted to float64 for a best guess. In fact, an example in the docs says shows that:

Constructing Series from a dictionary with an Index specified

d = {'a': 1, 'b': 2, 'c': 3}
ser = pd.Series(data=d, index=['a', 'b', 'c'])
ser
a   1
b   2
c   3
dtype: int64

The keys of the dictionary match with the Index values, hence the Index values have no effect.

d = {'a': 1, 'b': 2, 'c': 3}
ser = pd.Series(data=d, index=['x', 'y', 'z'])
ser
x   NaN
y   NaN
z   NaN
dtype: float64

Note that the Index is first build with the keys from the dictionary. After this the Series is reindexed with the given Index values, hence we get all NaN as a result.

And here it explains when it changes dtype, based on the Index:

If dtype is None, we find the dtype that best fits the data. If an actual dtype is provided, we coerce to that dtype if it’s safe. Otherwise, an error will be raised.