Weighted Mean Row wise with dynamically updated weights in Pandas-CodePudding

I would like to compute the weighted mean of my DataFrame row wise where the weights are fixed for every n rows (always starting in the first row) and have to be dynamically updated for the rows in between.

The formula for the dynamic update should be for instance for w_2_1 that is for the second row in column 1:

wt_2_1 = ([var1_1_1   1] * wt_1_1)/(sum_i(wt_2_i))
wt_2_2 = ([var2_1_1   1] * wt_2_1)/(sum_i(wt_2_i))

where sum_i(wt_2_i) is the sum of all weights in the respective row (here row 2).

To illustrate for n=2 and weights equal to wt_1=[0.5,0.5]:

df = pd.DataFrame(  {
'datetime': ['2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-06'],
'var1': [var1_1_1, var1_1_2, var1_1_3, var1_1_4, var1_1_5],
'var2': [var2_1_1, var2_1_2, var2_1_3, var2_1_4, var2_1_5]})
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index([ 'datetime'], inplace =True)

Below is an example with numbers of the target column I would like, called 'weighted mean'

df = pd.DataFrame(  {
'datetime': ['2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05', '2015-01-06'],
'var1': [0.07, 0.08, 0.04, 0.01, 0.03],
'var2': [0.02, 0.01, 0.02, 0.02, 0.08]})
df['datetime'] = pd.to_datetime(df['datetime'])
df.set_index([ 'datetime'], inplace =True)

df['weighted_mean'] = [0.045, wm_2, 0.03, wm_3, 0.055]

So the 1st, 3rd, and 5th values are using the fixed weights [0.05, 0.5]. The other ones are dynamically updated, see below:

sum_i(wt_2_i) = ([0.07   1] * 0.5)   ([0.02   1] * 0.5) = 1.045
wt_2_1 = ([0.07   1] * 0.5)/(sum_i(wt_2_i)) = 0.535/1.045 = 0.51196
wt_2_2 = ([0.02   1] * 0.5)/(sum_i(wt_2_i)) = 0.51/1.045  = 0.48804
wm_2 = 0.51196 * 0.08   0.48804 * 0.01 = 0.04583

CodePudding user response：

Let's try the following:

(i) Create a helper column: "ind".

(ii) Calculate LHS and RHS of sum_i(wt_2_i): ww

(iii) Calculate sum_i(wt_2_i) of each row: sm

(iv) Using helper column "ind", fill in "weighted_mean" column using the product of the previous row's ww and "val1" and "val2" values for each row.

df['ind'] = [1,0,1,0,1]
cols = ['var1','var2']
ww = (df[cols]   1) * 0.5 # use initial weights here
sm = ww.sum(axis=1)
df['weighted_mean'] = (sm - 1).where(df['ind']==1, (df[cols] * ww.shift()).sum(axis=1) / sm)
df = df.drop(columns='ind')

Output:

            var1  var2  weighted_mean
datetime                             
2015-01-02  0.07  0.02       0.045000
2015-01-03  0.08  0.01       0.045837
2015-01-04  0.04  0.02       0.030000
2015-01-05  0.01  0.02       0.015172
2015-01-06  0.03  0.08       0.055000