I have a data frame of 200*1000 rows and 6 columns. I want to calculate the correlation between 2 columns cor(df$y1, df$y2))
for every 200 rows, so that I get 1000 different correlation values as a result.
When I wanted to calculate the sums of every 200 rows I could simply use
rowsum(df,rep(1:1000,each=200))
but there is no such command in r as rowcor
that I could use equivalently for correlations.
CodePudding user response:
We may use a group by approach
by(df[c('y1', 'y2')], as.integer(gl(nrow(df), 200, nrow(df))),
FUN = function(x) cor(x$y1, x$y2))
Or using tidyverse
library(dplyr)
out <- df %>%
group_by(grp = as.integer(gl(n(), 200, n()))) %>%
summarise(Cor = cor(y1, y2))
> dim(out)
[1] 1000 2
data
set.seed(24)
df <- as.data.frame(matrix(rnorm(200 *1000 * 6), ncol = 6))
names(df)[1:2] <- c('y1', 'y2')