Home > Enterprise >  get the correlation for each row in r
get the correlation for each row in r

Time:06-09

Here's the data:

    tmp <- tibble::tibble(id = rep(1, 16), wells = 1:16, eur = c(21,23,45,43,23,55,34,44,65,56,66,76,87,67,76,56))
  
 tmp
    #> # A tibble: 16 × 3
    #>       id wells   eur
    #>    <dbl> <int> <dbl>
    #>  1     1     1    21
    #>  2     1     2    23
    #>  3     1     3    45
    #>  4     1     4    43
    #>  5     1     5    23
    #>  6     1     6    55
    #>  7     1     7    34
    #>  8     1     8    44
    #>  9     1     9    65
    #> 10     1    10    56
    #> 11     1    11    66
    #> 12     1    12    76
    #> 13     1    13    87
    #> 14     1    14    67
    #> 15     1    15    76
    #> 16     1    16    56

I want to get the correlation between wells and eur and store in a new column, the first row will be 1, the second value would be the correlation of (2:1, 21:23), the third value would be (3:1, 45:21) etc. I tried cor (tmp$wells, tmp$eur) but it will return only one value.

CodePudding user response:

1) Define a single complex vector consisting of wells and eur and then use it with rollapplyr with widths 1, 2, 3, ..., n() applying the indicated function which extracts the real and imaginary components and takes their correlation. Note that the correlation of a number to itself is undefined, not 1.

library(dplyr)
library(zoo)

cor_ <- function(x) cor(Re(x), Im(x))

tmp %>%
  group_by(id) %>%
  mutate(cor = rollapplyr(wells   eur * 1i, 1:n(), cor_)) %>%
  ungroup

giving:

# A tibble: 16 x 4
      id wells   eur    cor
   <dbl> <int> <dbl>  <dbl>
 1     1     1    21 NA    
 2     1     2    23  1    
 3     1     3    45  0.901
 4     1     4    43  0.891
 5     1     5    23  0.318
 6     1     6    55  0.620
 7     1     7    34  0.473
 8     1     8    44  0.521
 9     1     9    65  0.684
10     1    10    56  0.728
11     1    11    66  0.790
12     1    12    76  0.841
13     1    13    87  0.877
14     1    14    67  0.867
15     1    15    76  0.878
16     1    16    56  0.817

2) This alternative uses the same packages and gives the same result but does not use complex numbers. (1) does have the advantage that we can use the same cor_ no matter what 2 columns are chosen.

tmp %>%
  group_by(id) %>%
  mutate(cor = rollapplyr(1:n(), 1:n(), function(ix) cor(wells[ix], eur[ix]))) %>%
  ungroup
  • Related