I have a pandas DataFrame with a two-level hierarchical index. I would like to set a value based on a condition from a certain subset to a different subset.
I think this is best explained with a small example:
import numpy as np
import pandas as pd
example = pd.DataFrame({'ind_1': 5*[0] 5*[1], 'ind_2': np.concatenate([np.arange(5), np.arange(5)]),
'col_1': np.random.random(size=10), 'col_2': np.random.random(size=10)})
example = example.set_index(['ind_1', 'ind_2'])
example_0 = example.loc[0]
example_1 = example.loc[1]
example['condition'] = False
condition = example_1['col_1'] > 0.5
with the DataFrames
$ example
col_1 col_2 condition
ind_1 ind_2
0 0 0.430966 0.064335 False
1 0.631710 0.313696 False
2 0.354766 0.479626 False
3 0.548612 0.793249 False
4 0.144033 0.352583 False
1 0 0.586365 0.578001 False
1 0.306403 0.399591 False
2 0.312621 0.439042 False
3 0.010637 0.232054 False
4 0.762034 0.293433 False
$ example_0
col_1 col_2
ind_2
0 0.430966 0.064335
1 0.631710 0.313696
2 0.354766 0.479626
3 0.548612 0.793249
4 0.144033 0.352583
$ example_1
col_1 col_2
ind_2
0 0.586365 0.578001
1 0.306403 0.399591
2 0.312621 0.439042
3 0.010637 0.232054
4 0.762034 0.293433
$ condition
ind_2
0 True
1 False
2 False
3 False
4 True
Now I would like to assign a value as follows
example.loc[0].loc[condition] = True
which results (rightfully so) in a SettingWithCopyWarning
and simply not work in a more complex case.
The expected output would be
$ example
col_1 col_2 condition
ind_1 ind_2
0 0 0.430966 0.064335 True
1 0.631710 0.313696 False
2 0.354766 0.479626 False
3 0.548612 0.793249 False
4 0.144033 0.352583 True
1 0 0.586365 0.578001 False
1 0.306403 0.399591 False
2 0.312621 0.439042 False
3 0.010637 0.232054 False
4 0.762034 0.293433 False
So for ind_1 == 0
we set condition. But note that condition was calculated for ind_1 == 1
What would be cleanest way of doing this?
CodePudding user response:
You can reindex
on condition
then pass the numpy array:
example.loc[0, 'condition'] = condition.reindex(example.loc[0].index).values
Note you don't assign with chain index, i.e. .loc[].loc[]
, but do a .loc[ind, column]
.
Output:
col_1 col_2 condition
ind_1 ind_2
0 0 0.295983 0.241758 False
1 0.707799 0.765772 True
2 0.822369 0.062530 True
3 0.816543 0.621883 False
4 0.048521 0.738549 True
1 0 0.433304 0.527344 False
1 0.727886 0.557176 False
2 0.653163 0.686719 False
3 0.020094 0.887114 False
4 0.777072 0.506128 False