I have a problem updating some values in a MultiIndex Dataframe. I have a Dataframe with multiple data types (bool, int, float)
df = pd.DataFrame()
df['idx1'] = [0,1,2,3,4]
df['idx2'] = [0,1,2,3,4]
df['abool'] = True
df['a'] = np.arange(5, dtype='int64')
df['b'] = np.arange(5, dtype='float64')
df['c'] = np.arange(5, dtype='int64')
df = df.set_index(['idx1','idx2'])
df
The result is
abool a b c
idx1 idx2
0 0 True 0 0.0 0
1 1 True 1 1.0 1
2 2 True 2 2.0 2
3 3 True 3 3.0 3
4 4 True 4 4.0 4
If I change some values, the interpreter do the changes
idx = [(1,1),(3,3)]
df.loc[idx, 'c'] = 0
df.loc[idx, 'c']
idx1 idx2
1 1 0
3 3 0
Name: c, dtype: int64
But when I call a loc function
df.loc[idx, ['a', 'b']]
a b
idx1 idx2
1 1 1 1.0
3 3 3 3.0
Now, I try to modify the values of the column C again and the changes can't be applied.
df.loc[idx, 'c'] = 15
df.loc[idx, 'c']
idx1 idx2
1 1 0
3 3 0
Name: c, dtype: int64
I can still make changes to other columns, but I can't modify the values in column C that I want. Also, if I put the column C into a list (and return a DataFrame), I can see the changes, but if I call the column C as series, I can't still view the changes.
df.loc[idx, ['c']]
c
idx1 idx2
1 1 15
3 3 15
I can't understand why this happens. I will appreciate any help.
EDIT:
Recently I ran this script on pandas 1.3.5 and the behavior is as expected. Since pandas 1.4.0 I get the problem.
CodePudding user response:
You're not crazy, this is definitely a bug, I do not know if it is known.
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['idx1'] = [0,1,2,3,4]
df['idx2'] = [0,1,2,3,4]
df['abool'] = True
df['a'] = np.arange(5, dtype='int64')
df['b'] = np.arange(5, dtype='float64')
df['c'] = np.arange(5, dtype='int64')
df['c'] = np.arange(5)
df = df.set_index(['idx1','idx2'])
idx = [(1,1),(3,3)]
df.loc[idx, 'c'] = 0
df.loc[idx, 'c']
df.loc[idx, ['a', 'b']] # <-- This line screws everything up, and idk why...
df.loc[idx, 'c'] = 15
df.loc[idx, 'c']
Output with that line:
idx1 idx2
1 1 0
3 3 0
Name: c, dtype: int64
a b
idx1 idx2
1 1 1 1.0
3 3 3 3.0
idx1 idx2
1 1 0
3 3 0
Name: c, dtype: int64
--------------------------
>>> df['c']
idx1 idx2
0 0 0
1 1 0
2 2 2
3 3 0
4 4 4
Name: c, dtype: int64
>>> df[['c']]
c
idx1 idx2
0 0 0
1 1 15
2 2 2
3 3 15
4 4 4
Output without that line:
idx1 idx2
1 1 0
3 3 0
Name: c, dtype: int64
a b
idx1 idx2
1 1 1 1.0
3 3 3 3.0
idx1 idx2
1 1 15
3 3 15
Name: c, dtype: int64
--------------------------
>>> df['c']
idx1 idx2
0 0 0
1 1 15
2 2 2
3 3 15
4 4 4
Name: c, dtype: int64
>>> df[['c']]
c
idx1 idx2
0 0 0
1 1 15
2 2 2
3 3 15
4 4 4
CodePudding user response:
I couldn't duplicate the problem: