I have a pandas dataframe which I created as follows:
df = pd.DataFrame(columns= [["A","B","C"]] )
df["A"] = np.arange(1, 8761, 1)
Column A contains values from 1 to 8760. And Column B and C have NaN values in them. It looks as follows:
df.info(
) returns me the following:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 (A,) 8760 non-null int64
1 (B,) 0 non-null object
2 (C,) 0 non-null object
dtypes: int64(1), object(2)
memory usage: 205.4 KB
I'd like to have the same value in column B as column A. When I try
df["B"] = df["A"]
column B still has NaN values. But when I create a new column,
df["D"] = df["A"]
, column D has same values as column A.
I can get the same values in column B as in A, using
df.iloc[:,1] = df.iloc[:, 0]
But I am curious why I did not get it on the first time using `
df["B"] = df["A"]
`?
CodePudding user response:
This is because you are creating a MultiIndex:
df = pd.DataFrame(columns=[["A","B","C"]]) # <- note the list of list
df["A"] = np.arange(1, 8761, 1)
df.columns
output:
MultiIndex([('A',),
('B',),
('C',)],
)
Thus you would need df[('B',)] = df[('A',)]
to make the correct assignment.
The correct code should probably be have written if you want a simple index is:
df = pd.DataFrame(columns=["A","B","C"])
df["A"] = np.arange(1, 8761, 1)
df["B"] = df["A"]
output:
>>> df.head()
A B C
0 1 1 NaN
1 2 2 NaN
2 3 3 NaN
3 4 4 NaN
4 5 5 NaN