Calculating the averages of elements in one array based on data in another array-CodePudding

I need to average the Y values corresponding to the values in the X array...

X=np.array([  1,  1,  2,  2,  2,  2,  3,  3 ... ])

Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ... ])

In other words, the equivalents of the 1 values in the X array are 10 and 30 in the Y array, and the average of this is 20, the equivalents of the 2 values are 15, 10, 16, and 10, and their average is 12.75, and so on...

How can I calculate these average values?

CodePudding user response：

One option is to use a property of linear regression (with categorical variables):

import numpy as np

x = np.array([  1,  1,  2,  2,  2,  2,  3,  3 ])
y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ])

x_dummies = x[:, None] == np.unique(x)
means = np.linalg.lstsq(x_dummies, y, rcond=None)[0]
print(means) # [20.   12.75 17.5 ]

CodePudding user response：

You can try using pandas

import pandas as pd
import numpy as np

N = pd.DataFrame(np.transpose([X,Y]),
             columns=['X', 'Y']).groupby('X')['Y'].mean().to_numpy()
# array([20.  , 12.75, 17.5 ])

CodePudding user response：

import numpy as np

X = np.array([  1,  1,  2,  2,  2,  2,  3,  3])

Y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20])

# Only unique values
unique_vals = np.unique(X);

# Loop for every value
for val in unique_vals:
    # Search for proper indexes in Y
    idx = np.where(X == val)
    # Mean for finded indexes
    aver = np.mean(Y[idx])
    print(f"Average for {val}: {aver}")

Result:

Average for 1: 20.0

Average for 2: 12.75

Average for 3: 17.5

CodePudding user response：

you can use something like the below code :

import numpy as np

X=np.array([  1,  1,  2,  2,  2,  2,  3,  3])

Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20])


def groupby(a, b):
    # Get argsort indices, to be used to sort a and b in the next steps
    sidx = b.argsort(kind='mergesort')
    a_sorted = a[sidx]
    b_sorted = b[sidx]

    # Get the group limit indices (start, stop of groups)
    cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])

    # Split input array with those start, stop ones
    out = [a_sorted[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
    return out

group_by_array=groupby(Y,X)
for item in group_by_array:
    print(np.average(item))

I use the information in the below link to answer the question: Group numpy into multiple sub-arrays using an array of values

CodePudding user response：

I think this solution should work:

avg_arr = []
i = 1
while i <= np.max(x):
    inds = np.where(x == i)
    my_val = np.average(y[inds[0][0]:inds[0][-1]])
    avg_arr.append(my_val)
    i =1

Definitely, not the cleanest, but I was able to test it quickly and it does indeed work.