I have the following dataframe:
df1 = pd.DataFrame(
{
"A_close": [10, 12, 15],
"B_close": [20, 19, 29],
"C_close": [23, 21, 4],
"D_close": [45, 47, 44],
},
index = ['01-01-2020', '01-02-2020', '01-03-2020']
)
df2 = pd.DataFrame(
{
"A_weight": [0.1, 0.3, 0.1],
"B_weight": [0.2, 0.1, 0.2],
"C_weight": [0.3, 0.4, 0.1],
"D_weight": [0.5, 0.2, 0.6],
},
index = ['01-01-2020', '01-02-2020', '01-03-2020']
)
df1 = df1.join(df2)
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.sort_index(axis=1)
df1:
A B C D
close weight close weight close weight close weight
01-01-2020 10 0.1 20 0.2 23 0.3 45 0.5
01-02-2020 12 0.3 19 0.1 21 0.4 47 0.2
01-03-2020 15 0.1 29 0.2 4 0.1 44 0.6
I have defined the following function:
def wmedian(dtfrm):
df = dtfrm.unstack().sort_values('close')
return df.loc[df['weight'].cumsum() > 0.5, 'close'].iloc[0]
However, when I call the function wmedian(df1), I get an error: KeyError: 'close'.
The outcome I am trying to get is
df:
close
01-01-2020 23
01-02-2020 21
01-03-2020 44
What is missing in my function? I don't understand why I'm getting that error.
CodePudding user response:
Try to stack only one level:
wmedian = lambda x: x.loc[x['weight'].cumsum().gt(0.5), 'close'].head(1)
out = df1.stack(level=0).groupby(level=0).apply(wmedian) \
.reset_index(level=[1, 2], drop=True)
Output:
>>> out
01-01-2020 23
01-02-2020 21
01-03-2020 44
Name: close, dtype: int64
>>> df1.stack(level=0)
close weight
01-01-2020 A 10 0.1
B 20 0.2
C 23 0.3
D 45 0.5
01-02-2020 A 12 0.3
B 19 0.1
C 21 0.4
D 47 0.2
01-03-2020 A 15 0.1
B 29 0.2
C 4 0.1
D 44 0.6