Home > Blockchain >  How to select a row from an array based on the highest correlation to a given row?
How to select a row from an array based on the highest correlation to a given row?

Time:01-10

For each row of a data array X, I want to find the row (number or index) from a data array Y that shows the highest correlation.

X Row Value 1 Value 2 Value 3 Row index in Y with highest Corr
X1 10 5 1 ?
X2 1 5 10 ?
Y Row Value 1 Value 2 Value 3
Y1 1 4 10
Y2 3 4 3
Y3 10 4 1
...

From that, I would want to obtain the row index in Y with the highest correlation to each row in X

X Row Value 1 Value 2 Value 3 Row index in Y with highest Corr
X1 10 5 1 Y3
X2 1 5 10 Y1

I tried to apply a combination of Index and SortN to Arrayformula(CORREL(X1,Y1:Y)) but that does not work because it seems that correl will concatenate the rows if one argument consists of an array instead of a vector.

CodePudding user response:

Use byrow() and filter(), like this:

=byrow( 
  B2:D3, 
  lambda( 
    rowX, 
    lambda( 
      labelY, correlY, 
      single( filter(labelY, correlY = max(correlY)) ) 
    )( 
      A11:A13, 
      byrow( 
        B11:D13, 
        lambda( 
          rowY, 
          correl(rowX, rowY) 
        ) 
      ) 
    ) 
  ) 
)

...where the range A3:D3 holds the X table and the range A11:D13 holds the Y table.

  • Related