Why seting a column in a Dataframe from a Series with different index produces a column with NaNs?-CodePudding

In the following code, I have one DataFrame with two rows and a series with two values.

I would like to set the Series values in the column of my DataFrame.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(2, 1), index=["one", "two"])
print(df)
s = pd.Series(np.random.randn(2), index=["four", "five"])
df.loc[:, 0] = s
print(df)

However, the Series and the Dataframe doesn't have the same index. This results in NaNs in the Dataframe.

      0
one NaN
two NaN

In order to have my values in the column, I can simply use the .values attribute of s.

df.loc[:, 0] = s.values

I would like to understand what is the logic behind getting NaNs when doing the former.

CodePudding user response：

Before adding values to a Series/column, pandas aligns the indices.

This enables you to assign data when indices are missing or not in the same order.

For example:

df = pd.DataFrame(np.random.randn(2, 1), index=["one", "two"])
s = pd.Series([2, 1], index=["two", "one"]) # notice the different order

df.loc[:, 0] = s
print(df)

     0
one  1
two  2

You can check what should happen using reindex:

s = pd.Series(np.random.randn(2), index=["four", "five"])
s.reindex(df.index)

one   NaN
two   NaN
dtype: float64

Using values/to_numpy(), this converts the Series to numpy array and reindexing is no longer performed.

CodePudding user response：

Because indexing are not matching between df & series (s)....

on the flip; if you do s.values or s.to_list() ... it basially converts the series to array or list respectively... so no angle of indexwise mactching...

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(2, 1), index=["one", "two"])
print(df)
s = pd.Series(np.random.randn(2), index=["one", "two"]) #edited here
df.loc[:, 0] = s
print(df)

            0
one -0.560306
two -0.762751
            0
one  0.281997
two  0.361495