Python/Numpy get average of array based on index-CodePudding

I have two numpy arrays, the first one is the values and the second one is the indexes. What I want to do is to get the average of the values array based on the indexes array.

For example:

values = [1,2,3,4,5]
indexes = [0,0,1,1,2]
get_indexed_avg(values, indexes)
# should give me 
#   [1.5,    3.5,    5]

Here, the values in the indexes array represent the indexes in the final array. Hence:

First two items in the values array are being averaged to form the zero index in the final array.
The 3rd and the 4th item in the values array are being averaged to form the first index in the final array.
Finally the last item is being used to for the 2nd index in the final array.

I do have a python solution to this. But that is just horrible and very slow. Is there a better solution to this? maybe using numpy? or other such libraries.

CodePudding user response：

import pandas as pd
pd.Series(values).groupby(indexes).mean()
# OR
# pd.Series(values).groupby(indexes).mean().to_list()
# 0    1.5
# 1    3.5
# 2    5.0
# dtype: float64

CodePudding user response：

If you want to use Python's standard library, you can use the below code:

from statistics import mean

values = [1,2,3,4,5]
indexes = [0,0,1,1,2]

def get_indexed_avg(values, indexes):
    data = {}
    for i, index_key in enumerate(indexes):
        data[index_key] = data.get(index_key, [])   [values[i]]
    return [mean(data[key]) for key in data]

print(get_indexed_avg(values, indexes))
# [1.5, 3.5, 5]