I have the below dataset, where I am trying to do a rolling 3 days correlation across x,y,z,a. So the code should do rolling correlations of xy,xz,xa, yx, yz,ya and so on. Also, as you can see below, the data for y and a is incomplete, but I would wish to do rolling correlations of them starting from the date where they first had values (i.e. id 3 and id 4).
How should I accomplish this? Don't know where to start...
set.seed(42)
n <- 10
dat <- data.frame(id=1:n,
date=seq.Date(as.Date("2020-12-22"), as.Date("2020-12-31"), "day"),
x=rnorm(n),
y=rnorm(n),
z=rnorm(n),
a=rnorm(n))
dat$y[1:2] <- NA
dat$a[1:3] <- NA
I am able to find this set of code from stack, but it only helps in finding the answer for 1st column and not all the columns
rollapplyr(x, 5, function(x) cor(x[, 1], x[, -1]), by.column = FALSE)
CodePudding user response:
combn
produces all the combinations.
cols <- c("x", "y", "z", "a")
combn(cols, 2)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] "x" "x" "x" "y" "y" "z"
# [2,] "y" "z" "a" "z" "a" "a"
combn
has a function argument where you first na.omit
all rows with NA's
. Then subset with mapply
over incrementing sequences 1:3
and calculate correlations, until nrow
is reached.
w <- 3 ## size of the rolling window
combn(dat[cols], 2, function(x) {
X <- na.omit(x)
n <- nrow(X)
mapply(function(y, z) cor(X[y z, 1], X[y z, 2]), list(1:w), 0:(n - w))
}, simplify=FALSE)
# [[1]]
# [1] 0.5307784 -0.9874843 -0.8364802 0.2407730 0.3655328 -0.4458231
#
# [[2]]
# [1] 0.8121466 0.9652715 0.3304100 0.8278965 -0.1425097 0.5832558 0.9959705
# [8] 0.8696023
#
# [[3]]
# [1] 0.6733985 0.2194488 0.5593983 -0.6589249 -0.9291184
#
# [[4]]
# [1] 0.97528684 -0.90599558 -0.42319742 0.92882443 0.28058418 0.05427966
#
# [[5]]
# [1] -0.7815678 -0.7182037 -0.6698260 0.4592962 0.7452225
#
# [[6]]
# [1] 0.9721521 0.9343926 -0.3470329 -0.7237291 -0.6253825
CodePudding user response:
Create a data frame with only the columns wanted and then use rollapplyr
with cor
. cor
takes a use=
argument that specifies how missing values are to be handled. See ?cor
for the values it can take since you may or may not wish to use the value we used below.
The result r
is a matrix whose i-th row describes the correlation matrix of the 5 dat2
rows ending in and including row i
. That is, matrix(r[i, ], 4, 4) is the correlation matrix of dat2[i-(4:0), ]
.
We can also create ar
which is a 3d array which is such that ar[i,,]
is the correlation matrix of the 5 rows of dat2 ending in and including row i
.
That is these are equal for each i in 5, ..., nrow(dat2). (The first 4 rows of r are all NA since there do not exist 5 rows leading to those rows.)
1. cor(dat2[i-(4:0), ], use = "pairwise")
2. matrix(r[i, ], 4, 4)
3. ar[i,,]
We run checks for these equivalences for i=5 below.
library(zoo)
w <- 5
dat2 <- dat[c("x", "y", "z", "a")]
nr <- nrow(dat2)
nc <- ncol(dat2)
r <- rollapplyr(dat2, w, cor, use = "pairwise", by.column = FALSE, fill = NA)
colnames(r) <- paste(names(dat2)[c(row(diag(nc)))],
names(dat2)[c(col(diag(nc)))], sep = ".")
ar <- array(r, c(nr, nc, nc),
dimnames = list(NULL, names(dat2), names(dat2)))
# run some checks
cor5 <- cor(dat2[1:w, ], use = "pairwise") # cor of 1st w rows
# same except for names
all.equal(unname(cor5), matrix(r[w, ], nc))
## [1] TRUE
all.equal(cor5, ar[w,,])
## [1] TRUE
The above shows a matrix whose rows are strung out correlation matrices and a 3d array whose slices are correlation matrices. Another possibility for output is to create a list of correlation matrices.
lapply(1:nr, function(i) {
if (i >= w) cor(dat2[i-((w-1):0), ], use = "pairwise")
})