Why cannot I replace a column with NaN value with values from another column directly?-CodePudding

I have a pandas dataframe which I created as follows:

df = pd.DataFrame(columns= [["A","B","C"]] )
df["A"] = np.arange(1, 8761, 1)

Column A contains values from 1 to 8760. And Column B and C have NaN values in them. It looks as follows:

df.info() returns me the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   (A,)    8760 non-null   int64 
 1   (B,)    0 non-null      object
 2   (C,)    0 non-null      object
dtypes: int64(1), object(2)
memory usage: 205.4  KB

I'd like to have the same value in column B as column A. When I try

df["B"] = df["A"]

column B still has NaN values. But when I create a new column,

df["D"] = df["A"]

, column D has same values as column A.

I can get the same values in column B as in A, using

df.iloc[:,1] = df.iloc[:, 0]

But I am curious why I did not get it on the first time using `

df["B"] = df["A"]

CodePudding user response：

This is because you are creating a MultiIndex:

df = pd.DataFrame(columns=[["A","B","C"]]) # <- note the list of list
df["A"] = np.arange(1, 8761, 1)
df.columns

output:

MultiIndex([('A',),
            ('B',),
            ('C',)],
           )

Thus you would need df[('B',)] = df[('A',)] to make the correct assignment.

The correct code should probably be have written if you want a simple index is:

df = pd.DataFrame(columns=["A","B","C"])
df["A"] = np.arange(1, 8761, 1)
df["B"] = df["A"]

output:

>>> df.head()
   A  B    C
0  1  1  NaN
1  2  2  NaN
2  3  3  NaN
3  4  4  NaN
4  5  5  NaN