i am trying to become a self taught data analyst. In Pandas when i index different names in the second part of the code, the values turn from 450 to Nan, from 500 to Nan and 380 becomes 380.0 (float). Also, the dtype turns from int64 to float64. Any ideas why this happens? Also if i copy an example from w3schools is it displayed fine.
import numpy as np
import pandas as pd
calories= {"Day 1": 450, "Day 2": 500, "day 3": 380}
new_series= pd.Series(calories)
print(new_series)
**#Second part of code**
new_series_1= pd.Series(calories, index=["day 1", "day 2", "day 3"])
print(new_series_1)
CodePudding user response:
I tried out your code. It is a simple fix. Python, like a lot of programs is case sensitive. You just need to revise your statement.
Change from:
new_series_1= pd.Series(calories, index=["day 1", "day 2", "day 3"])
to:
new_series_1= pd.Series(calories, index=["Day 1", "Day 2", "day 3"])
Note the capital letters.
When I made sure that the column names matched, I got similar output.
Day 1 450
Day 2 500
day 3 380
dtype: int64
Day 1 450
Day 2 500
day 3 380
dtype: int64
Hope that helps.
Regards.
CodePudding user response:
tl;dr
In new_series_1
, calories
keys don't match with the index
values, and the Series is being reindexed with the latter, hence the NaN
and float64
.
Explanation
First you initialize new_series
with calories
, which is a dict
with int
values:
calories= {"Day 1": 450, "Day 2": 500, "day 3": 380}
new_series= pd.Series(calories)
So Pandas knows they can be treated best as int64
.
Then you set 2 different values in index, day 1
and day 2
, no capitalized:
new_series_1= pd.Series(calories, index=["day 1", "day 2", "day 3"])
There was no more correspondence between calories
's keys and index
values, so Pandas defaulted to float64
for a best guess.
In fact, an example in the docs says shows that:
Constructing Series from a dictionary with an Index specified
d = {'a': 1, 'b': 2, 'c': 3}
ser = pd.Series(data=d, index=['a', 'b', 'c'])
ser
a 1
b 2
c 3
dtype: int64
The keys of the dictionary match with the Index values, hence the Index values have no effect.
d = {'a': 1, 'b': 2, 'c': 3}
ser = pd.Series(data=d, index=['x', 'y', 'z'])
ser
x NaN
y NaN
z NaN
dtype: float64
Note that the Index is first build with the keys from the dictionary. After this the Series is reindexed with the given Index values, hence we get all NaN as a result.
And here it explains when it changes dtype
, based on the Index:
If dtype is None, we find the dtype that best fits the data. If an actual dtype is provided, we coerce to that dtype if it’s safe. Otherwise, an error will be raised.