For each row of a data array X, I want to find the row (number or index) from a data array Y that shows the highest correlation.
X Row | Value 1 | Value 2 | Value 3 | Row index in Y with highest Corr | |
---|---|---|---|---|---|
X1 | 10 | 5 | 1 | ? | |
X2 | 1 | 5 | 10 | ? |
Y Row | Value 1 | Value 2 | Value 3 |
---|---|---|---|
Y1 | 1 | 4 | 10 |
Y2 | 3 | 4 | 3 |
Y3 | 10 | 4 | 1 |
... |
From that, I would want to obtain the row index in Y with the highest correlation to each row in X
X Row | Value 1 | Value 2 | Value 3 | Row index in Y with highest Corr | |
---|---|---|---|---|---|
X1 | 10 | 5 | 1 | Y3 | |
X2 | 1 | 5 | 10 | Y1 |
I tried to apply a combination of Index and SortN to Arrayformula(CORREL(X1,Y1:Y)) but that does not work because it seems that correl will concatenate the rows if one argument consists of an array instead of a vector.
CodePudding user response:
Use byrow()
and filter()
, like this:
=byrow(
B2:D3,
lambda(
rowX,
lambda(
labelY, correlY,
single( filter(labelY, correlY = max(correlY)) )
)(
A11:A13,
byrow(
B11:D13,
lambda(
rowY,
correl(rowX, rowY)
)
)
)
)
)
...where the range A3:D3
holds the X table and the range A11:D13
holds the Y table.