Pandas DataFrame column (Series) has different index than the Dataframe?-CodePudding

Consider this small script:

import pandas as pd

aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a
bb.index = bb.index   1
aa['b'] = bb
print(aa)
print(aa.a - aa.b)

the output is:

while I was expecting aa.a - aa.b to be

0    NaN
1    1.0
2    1.0

How is this possible? Is it a Pandas bug?

CodePudding user response：

aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a
bb.index = bb.index   1
aa['b'] = bb
aa.reset_index(drop=True)  # add this

your index does not match.

CodePudding user response：

When you do aa.b - aa.a , you're substracting 2 pandas.Series having a same lenght, but not the same index :

aa.a

1    1
2    2
3    3
Name: a, dtype: int64

Where as:

aa.b

0    NaN
1    1.0
2    2.0
Name: b, dtype: float64

And when you do :

print(aa.b - aa.a)

you're printing the merge of these 2 pandas.Series (regardless the operation type : addition or substraction), and that's why the indices [0,1,2] and [1,2,3] will merged to a new index from 0 to 3 : [0,1,2,3].

And for instance, if you shift of 2 your bb.index instead of 1:

bb.index = bb.index   2

that time, you will have 5 rows in your new pandas.Series instead of 4. And so on..

bb.index = bb.index   2
aa['b'] = bb
print(aa.a - aa.b)

0    NaN
1    NaN
2    0.0
3    NaN
4    NaN
dtype: float64