Home > OS >  Pandas set values for part of column with multiindex
Pandas set values for part of column with multiindex

Time:03-19

I'm sure that there is a simple way to do this, but I can't seem to find it. Essentially, I have a DataFrame with a MultiIndex and I want to set one part of a column (corresponding to a value in the first index level) with a Series. However, whatever I try, the values are being set to NaN, even though the index names align.

An example of the things I tried:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(12).reshape(4, 3),
                  columns=['col1', 'col2', 'col3'],
                  index=pd.MultiIndex.from_product((('a1', 'a2'), ('b1', 'b2')),
                                                   names=['idx1', 'idx2']))
s = pd.Series([100, 101], index=['b1', 'b2'], name='col3')
df.loc['a1', 'col3'] = s

After this, I would expect df to be

           col1  col2  col3
idx1 idx2                  
a1   b1       0     1   100
     b2       3     4   101
a2   b1       6     7     8
     b2       9    10    11

but it is

           col1  col2  col3
idx1 idx2                  
a1   b1       0     1   NaN
     b2       3     4   NaN
a2   b1       6     7   8.0
     b2       9    10  11.0

Any idea on how to achieve this without .to_numpy() or .to_list() (since then I would need to check the order of items manually) in a one liner without doing something ridiculous like

df.loc['a1', 'col3'] = s.to_frame().eval('idx1 = "a1"').reset_index().set_index(['idx1', 'idx2'])

CodePudding user response:

Assuming you don't have in s values that are not present in df's Index, you could do:

df.loc[('a1', s.index), 'col3'] = s.values

This will work independently of the order of s

output:

           col1  col2  col3
idx1 idx2                  
a1   b1       0     1   100
     b2       3     4   101
a2   b1       6     7     8
     b2       9    10    11

CodePudding user response:

You could use s.to_numpy() or s.tolist():

>>> df.loc['a1', 'col3'] = s.to_numpy()
           col1  col2   col3
idx1 idx1                   
a1   b1       0     1  100.0
     b2       3     4  101.0
a2   b1       6     7    8.0
     b2       9    10   11.0

CodePudding user response:

Use:

df.loc['a1', 'col3'] = [100, 101]

Output:

enter image description here

Based on your comment:

s = pd.Series([100, 101], index=['b1', 'b2'], name='col3')
s = s.to_list()
df.loc['a1', 'col3'] = s

With the same output.

Based on the other comment!:

s = pd.Series([100, 101], index=(('a1', 'b1'), ('a1', 'b2')), name='col3')
df.loc['a1', 'col3'] = s

Output:

enter image description here

  • Related