I have two numpy arrays, the first one is the values
and the second one is the indexes
. What I want to do is to get the average of the values
array based on the indexes
array.
For example:
values = [1,2,3,4,5]
indexes = [0,0,1,1,2]
get_indexed_avg(values, indexes)
# should give me
# [1.5, 3.5, 5]
Here, the values in the indexes
array represent the indexes in the final array. Hence:
- First two items in the
values
array are being averaged to form the zero index in the final array. - The 3rd and the 4th item in the
values
array are being averaged to form the first index in the final array. - Finally the last item is being used to for the 2nd index in the final array.
I do have a python solution to this. But that is just horrible and very slow. Is there a better solution to this? maybe using numpy? or other such libraries.
CodePudding user response:
import pandas as pd
pd.Series(values).groupby(indexes).mean()
# OR
# pd.Series(values).groupby(indexes).mean().to_list()
# 0 1.5
# 1 3.5
# 2 5.0
# dtype: float64
CodePudding user response:
If you want to use Python's standard library, you can use the below code:
from statistics import mean
values = [1,2,3,4,5]
indexes = [0,0,1,1,2]
def get_indexed_avg(values, indexes):
data = {}
for i, index_key in enumerate(indexes):
data[index_key] = data.get(index_key, []) [values[i]]
return [mean(data[key]) for key in data]
print(get_indexed_avg(values, indexes))
# [1.5, 3.5, 5]