Home > Back-end >  How to calculate the correlation of 2 variables for every nth rows in a data frame in r?
How to calculate the correlation of 2 variables for every nth rows in a data frame in r?

Time:05-16

I have a data frame of 200*1000 rows and 6 columns. I want to calculate the correlation between 2 columns cor(df$y1, df$y2)) for every 200 rows, so that I get 1000 different correlation values as a result. When I wanted to calculate the sums of every 200 rows I could simply use

rowsum(df,rep(1:1000,each=200))

but there is no such command in r as rowcor that I could use equivalently for correlations.

CodePudding user response:

We may use a group by approach

by(df[c('y1', 'y2')], as.integer(gl(nrow(df), 200, nrow(df))),
      FUN = function(x) cor(x$y1, x$y2))

Or using tidyverse

library(dplyr)
out <- df %>%
   group_by(grp = as.integer(gl(n(), 200, n()))) %>%
   summarise(Cor = cor(y1, y2))
> dim(out)
[1] 1000    2

data

set.seed(24)
df <- as.data.frame(matrix(rnorm(200 *1000 * 6), ncol = 6))
names(df)[1:2] <- c('y1', 'y2')
  • Related