In the following code, I have one DataFrame with two rows and a series with two values.
I would like to set the Series values in the column of my DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(2, 1), index=["one", "two"])
print(df)
s = pd.Series(np.random.randn(2), index=["four", "five"])
df.loc[:, 0] = s
print(df)
However, the Series and the Dataframe doesn't have the same index. This results in NaNs in the Dataframe.
0
one NaN
two NaN
In order to have my values in the column, I can simply use the .values
attribute of s.
df.loc[:, 0] = s.values
I would like to understand what is the logic behind getting NaNs when doing the former.
CodePudding user response:
Before adding values to a Series/column, pandas aligns the indices.
This enables you to assign data when indices are missing or not in the same order.
For example:
df = pd.DataFrame(np.random.randn(2, 1), index=["one", "two"])
s = pd.Series([2, 1], index=["two", "one"]) # notice the different order
df.loc[:, 0] = s
print(df)
0
one 1
two 2
You can check what should happen using reindex
:
s = pd.Series(np.random.randn(2), index=["four", "five"])
s.reindex(df.index)
one NaN
two NaN
dtype: float64
Using values
/to_numpy()
, this converts the Series to numpy array and reindexing is no longer performed.
CodePudding user response:
Because indexing are not matching between df & series (s)....
on the flip; if you do s.values or s.to_list() ... it basially converts the series to array or list respectively... so no angle of indexwise mactching...
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(2, 1), index=["one", "two"])
print(df)
s = pd.Series(np.random.randn(2), index=["one", "two"]) #edited here
df.loc[:, 0] = s
print(df)
0
one -0.560306
two -0.762751
0
one 0.281997
two 0.361495