Weighted average of a dictionary

I have the following column in a data-frame (it is an example):

First row is: '{"100":10,"50":3,"-90":2}'.

Second row is: '{"100":70,"50":3,"-90":2,"-40":3}'.

I want to calculate a weighted average where the dictionary's keys are the values and the dictionary's values are the weights of the weighted average.

The final value of the first row should be: 64.666, which is (100*10 50*3-90*2)/(10 3 2); and the of the second row should be: 87.82.

For each dictionary there might be hundreds of keys/values and the column might have thousands of rows. How can I code it efficiently? Preferably vectorially.

CodePudding user response：

You can use json.loads and pandas.Series.apply.

import json

def cal_avg(dct):
    dct = json.loads(dct)
    return sum(int(k)*v for k,v in dct.items()) / sum(dct[k] for k in dct)

df['dct'].apply(cal_avg)

Output:

0    64.666667
1    87.820513
Name: dct, dtype: float64

Input DataFrame:

import pandas as pd
df = pd.DataFrame({
    'dct': [
        '{"100":10,"50":3,"-90":2}',
        '{"100":70,"50":3,"-90":2,"-40":3}'
    ]
})

CodePudding user response：

Your first row is missing a column, so I'm assuming there's a NaN value there.

Edit: whoops, mixed up my values and weights

df = pd.DataFrame({'100': [10, 70], '50': [3, 3], '-90': [2, 2,], '-40': [None, 3]})

get_values = df.columns.values.astype('int')

df['weighted_avg'] = np.nansum(get_values*df.values, axis = 1) / np.nansum(df.values, axis = 1)

CodePudding user response：

try this easy for loop:

dataset = [{"100": 10, "50": 3, "-90": 2}, {"100": 70, "50": 3, "-90": 2, "-40": 3}]

for data in dataset:
    weighted_value = 0
    weight = 0
    for item in data.items():
        weighted_value  = int(item[0]) * item[1]
        weight  = item[1]
    weighted_average = weighted_value / weight
    print(weighted_average)

performance wise i dont think this can be optimized a lot further.