Home > Mobile >  Assign value to pandas DataFrame with hierarchical index based on stacked condition
Assign value to pandas DataFrame with hierarchical index based on stacked condition

Time:12-17

I have a pandas DataFrame with a two-level hierarchical index. I would like to set a value based on a condition from a certain subset to a different subset.

I think this is best explained with a small example:

import numpy as np
import pandas as pd

example = pd.DataFrame({'ind_1': 5*[0]   5*[1], 'ind_2': np.concatenate([np.arange(5), np.arange(5)]),
                        'col_1': np.random.random(size=10), 'col_2': np.random.random(size=10)})
example = example.set_index(['ind_1', 'ind_2'])
example_0 = example.loc[0]
example_1 = example.loc[1]
example['condition'] = False

condition = example_1['col_1'] > 0.5

with the DataFrames

$ example
                col_1     col_2  condition
ind_1 ind_2                               
0     0      0.430966  0.064335      False
      1      0.631710  0.313696      False
      2      0.354766  0.479626      False
      3      0.548612  0.793249      False
      4      0.144033  0.352583      False
1     0      0.586365  0.578001      False
      1      0.306403  0.399591      False
      2      0.312621  0.439042      False
      3      0.010637  0.232054      False
      4      0.762034  0.293433      False

$ example_0
          col_1     col_2
ind_2                    
0      0.430966  0.064335
1      0.631710  0.313696
2      0.354766  0.479626
3      0.548612  0.793249
4      0.144033  0.352583

$ example_1
          col_1     col_2
ind_2                    
0      0.586365  0.578001
1      0.306403  0.399591
2      0.312621  0.439042
3      0.010637  0.232054
4      0.762034  0.293433

$ condition
ind_2
0     True
1    False
2    False
3    False
4     True

Now I would like to assign a value as follows

example.loc[0].loc[condition] = True

which results (rightfully so) in a SettingWithCopyWarning and simply not work in a more complex case.

The expected output would be

$ example
                col_1     col_2  condition
ind_1 ind_2                               
0     0      0.430966  0.064335      True
      1      0.631710  0.313696      False
      2      0.354766  0.479626      False
      3      0.548612  0.793249      False
      4      0.144033  0.352583      True
1     0      0.586365  0.578001      False
      1      0.306403  0.399591      False
      2      0.312621  0.439042      False
      3      0.010637  0.232054      False
      4      0.762034  0.293433      False

So for ind_1 == 0 we set condition. But note that condition was calculated for ind_1 == 1

What would be cleanest way of doing this?

CodePudding user response:

You can reindex on condition then pass the numpy array:

example.loc[0, 'condition'] = condition.reindex(example.loc[0].index).values

Note you don't assign with chain index, i.e. .loc[].loc[], but do a .loc[ind, column].

Output:

                col_1     col_2  condition
ind_1 ind_2                               
0     0      0.295983  0.241758      False
      1      0.707799  0.765772       True
      2      0.822369  0.062530       True
      3      0.816543  0.621883      False
      4      0.048521  0.738549       True
1     0      0.433304  0.527344      False
      1      0.727886  0.557176      False
      2      0.653163  0.686719      False
      3      0.020094  0.887114      False
      4      0.777072  0.506128      False
  • Related