I would like to compute the weighted mean of my DataFrame row wise where the weights are fixed for every n rows (always starting in the first row) and have to be dynamically updated for the rows in between.
The formula for the dynamic update should be for instance for w_2_1
that is for the second row in column 1:
wt_2_1 = ([var1_1_1 1] * wt_1_1)/(sum_i(wt_2_i))
wt_2_2 = ([var2_1_1 1] * wt_2_1)/(sum_i(wt_2_i))
where sum_i(wt_2_i)
is the sum of all weights in the respective row (here row 2).
To illustrate for n=2
and weights equal to wt_1=[0.5,0.5]
:
df = pd.DataFrame( {
'datetime': ['2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-06'],
'var1': [var1_1_1, var1_1_2, var1_1_3, var1_1_4, var1_1_5],
'var2': [var2_1_1, var2_1_2, var2_1_3, var2_1_4, var2_1_5]})
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index([ 'datetime'], inplace =True)
Below is an example with numbers of the target column I would like, called 'weighted mean'
df = pd.DataFrame( {
'datetime': ['2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-06'],
'var1': [0.07, 0.08, 0.04, 0.01, 0.03],
'var2': [0.02, 0.01, 0.02, 0.02, 0.08]})
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index([ 'datetime'], inplace =True)
df['weighted_mean'] = [0.045, wm_2, 0.03, wm_3, 0.055]
So the 1st, 3rd, and 5th values are using the fixed weights [0.05, 0.5]
. The other ones are dynamically updated, see below:
sum_i(wt_2_i) = ([0.07 1] * 0.5) ([0.02 1] * 0.5) = 1.045
wt_2_1 = ([0.07 1] * 0.5)/(sum_i(wt_2_i)) = 0.535/1.045 = 0.51196
wt_2_2 = ([0.02 1] * 0.5)/(sum_i(wt_2_i)) = 0.51/1.045 = 0.48804
wm_2 = 0.51196 * 0.08 0.48804 * 0.01 = 0.04583
CodePudding user response:
Let's try the following:
(i) Create a helper column: "ind".
(ii) Calculate LHS and RHS of sum_i(wt_2_i)
: ww
(iii) Calculate sum_i(wt_2_i)
of each row: sm
(iv) Using helper column "ind", fill in "weighted_mean" column using the product of the previous row's ww
and "val1" and "val2" values for each row.
df['ind'] = [1,0,1,0,1]
cols = ['var1','var2']
ww = (df[cols] 1) * 0.5 # use initial weights here
sm = ww.sum(axis=1)
df['weighted_mean'] = (sm - 1).where(df['ind']==1, (df[cols] * ww.shift()).sum(axis=1) / sm)
df = df.drop(columns='ind')
Output:
var1 var2 weighted_mean
datetime
2015-01-02 0.07 0.02 0.045000
2015-01-03 0.08 0.01 0.045837
2015-01-04 0.04 0.02 0.030000
2015-01-05 0.01 0.02 0.015172
2015-01-06 0.03 0.08 0.055000