Home > Back-end >  Problem Updating Values from MultiIndex Dataframe after using loc
Problem Updating Values from MultiIndex Dataframe after using loc

Time:07-27

I have a problem updating some values in a MultiIndex Dataframe. I have a Dataframe with multiple data types (bool, int, float)

df = pd.DataFrame()
df['idx1'] = [0,1,2,3,4]
df['idx2'] = [0,1,2,3,4]
df['abool'] = True
df['a'] = np.arange(5, dtype='int64')
df['b'] = np.arange(5, dtype='float64')
df['c'] = np.arange(5, dtype='int64')
df = df.set_index(['idx1','idx2'])

df

The result is

           abool  a    b  c
idx1 idx2                  
0    0      True  0  0.0  0
1    1      True  1  1.0  1
2    2      True  2  2.0  2
3    3      True  3  3.0  3
4    4      True  4  4.0  4

If I change some values, the interpreter do the changes

idx = [(1,1),(3,3)]
df.loc[idx, 'c'] = 0
df.loc[idx, 'c']
idx1  idx2
1     1       0
3     3       0
Name: c, dtype: int64

But when I call a loc function

df.loc[idx, ['a', 'b']]
           a    b
idx1 idx2        
1    1     1  1.0
3    3     3  3.0

Now, I try to modify the values of the column C again and the changes can't be applied.

df.loc[idx, 'c'] = 15
df.loc[idx, 'c']
idx1  idx2
1     1       0
3     3       0
Name: c, dtype: int64

I can still make changes to other columns, but I can't modify the values in column C that I want. Also, if I put the column C into a list (and return a DataFrame), I can see the changes, but if I call the column C as series, I can't still view the changes.

df.loc[idx, ['c']]
            c
idx1 idx2    
1    1     15
3    3     15

I can't understand why this happens. I will appreciate any help.

EDIT:

Recently I ran this script on pandas 1.3.5 and the behavior is as expected. Since pandas 1.4.0 I get the problem.

CodePudding user response:

You're not crazy, this is definitely a bug, I do not know if it is known.

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['idx1'] = [0,1,2,3,4]
df['idx2'] = [0,1,2,3,4]
df['abool'] = True
df['a'] = np.arange(5, dtype='int64')
df['b'] = np.arange(5, dtype='float64')
df['c'] = np.arange(5, dtype='int64')
df['c'] = np.arange(5)
df = df.set_index(['idx1','idx2'])

idx = [(1,1),(3,3)]
df.loc[idx, 'c'] = 0
df.loc[idx, 'c']
df.loc[idx, ['a', 'b']] # <-- This line screws everything up, and idk why...
df.loc[idx, 'c'] = 15
df.loc[idx, 'c']

Output with that line:

idx1  idx2
1     1       0
3     3       0
Name: c, dtype: int64

           a    b
idx1 idx2
1    1     1  1.0
3    3     3  3.0

idx1  idx2
1     1       0
3     3       0
Name: c, dtype: int64
--------------------------
>>> df['c']
idx1  idx2
0     0       0
1     1       0
2     2       2
3     3       0
4     4       4
Name: c, dtype: int64
>>> df[['c']]
            c
idx1 idx2
0    0      0
1    1     15
2    2      2
3    3     15
4    4      4

Output without that line:

idx1  idx2
1     1       0
3     3       0
Name: c, dtype: int64

           a    b
idx1 idx2
1    1     1  1.0
3    3     3  3.0

idx1  idx2
1     1       15
3     3       15
Name: c, dtype: int64
--------------------------
>>> df['c']
idx1  idx2
0     0        0
1     1       15
2     2        2
3     3       15
4     4        4
Name: c, dtype: int64
>>> df[['c']]
            c
idx1 idx2
0    0      0
1    1     15
2    2      2
3    3     15
4    4      4

CodePudding user response:

I couldn't duplicate the problem:

enter image description here

  • Related