Home > Software design >  Setting a slice of Pandas DataFrame with MultiIndex
Setting a slice of Pandas DataFrame with MultiIndex

Time:11-10

Consider the following example:

a = pd.DataFrame([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]], index=['a', 'b', 'c', 'd', 'e'], columns=['A', 'B'])
b = pd.Series([10], index=['c'])
a.loc['a':'c', 'A'] = b
print(a)

This correctly sets the third value of A. I believe this is also the correct way to set a slice of the dataframe.

      A    B
a   NaN  2.0
b   NaN  3.0
c  10.0  4.0
d   4.0  5.0
e   5.0  6.0

Next, consider an example with multi-index.

d = pd.DataFrame([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]], index=pd.MultiIndex.from_tuples([(1, 'a'), (1, 'b'), (1, 'c'), (2, 'd'), (2, 'e')], names=['First', 'Second']), columns=['A', 'B'])
d.loc[1, 'A'] = b
print(d)

This does not correctly set the third value.

                A  B
First Second        
1     a       NaN  2
      b       NaN  3
      c       NaN  4
2     d       4.0  5
      e       5.0  6

[Edit] Here is a more direct example of what the problem is. I would have expected the below to work.

d = pd.DataFrame([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [7, 8]], index=pd.MultiIndex.from_tuples([(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), (2, 'b'), (2, 'c')], names=['First', 'Second']), columns=['A', 'B'])
print(d)
#               A  B
# First Second      
# 1     a       1  2
#       b       2  3
#       c       3  4
# 2     a       4  5
#       b       5  6
#       c       7  8

d.loc[1, 'A'] = d.loc[2, 'A']
print(d)

#                 A  B
# First Second        
# 1     a       NaN  2
#       b       NaN  3
#       c       NaN  4
# 2     a       4.0  5
#       b       5.0  6
#       c       7.0  8

How do you set a slice of a dataframe with multi-index?

CodePudding user response:

Index alignment is the reason why the multiindex is not working; for the single index case it was easy to align since they are both single indices; for the MultiIndex, you are aligning the second level of d with the first level of b, hence the nulls.

One way about it is to ensure both indices are aligned - for this case a reindex suffices:

d.loc[1, 'A'] = b.reindex(d.index, level = -1)
                 A  B
First Second         
1     a        NaN  2
      b        NaN  3
      c       10.0  4
2     a        4.0  5
      b        5.0  6
      c        7.0  8

Use the same concept for the second example in your question:

d.loc[1, 'A'] = d.loc[2, 'A'].reindex(d.index, level = -1)

d
              A  B
First Second      
1     a       4  2
      b       5  3
      c       7  4
2     a       4  5
      b       5  6
      c       7  8
  • Related