I am trying to run circular regression in R using the circular package. My dataset is somewhat large, ~85000 rows and 6 variables. When I try to run the model, I get a error message reading "Error: cannot allocate vector of size 53.3 Gb." I am more of a statistician than a programmer so I can't figure out how to fix this, other than it seems odd that it's throwing out this large memory allocation, as my dataset is not that large. I have attached a fictional dataset and code below. Thank you.
library(circular)
set.seed(12)
n = 80000
df <- data.frame(y = rnorm(n,2,.2),
x1 = rnorm(n,100,2),
x2 = rnorm(n,0,1),
x3 = rnorm(n,9,.2),
x4 = rnorm(n,0,1),
x5 = rnorm(n,1,.1))
y <- circular(df$y, type = "angles", units = "radians")
x <- model.matrix(y ~., data = df)
m1 <- lm.circular(y = y, x = x, type = "c-l", init = c(1,.01,.5,.5,.5,.5))
CodePudding user response:
The implementation tries to set up some diagonal matrices of size n x n using
A <- diag(k * A1(k), nrow = n)
g.p <- diag(apply(x, 1, function(row, betaPrev) 2/(1 (t(betaPrev) %*%
row)^2), betaPrev = betaPrev), nrow = n)
(in circular:::LmCircularclRad
) without using any sparse matrix tricks. For your example, those matrices would each take 50 GB of memory, and that allocation fails.
I don't think there's anything you can do to avoid this, other than suggesting a more efficient way to carry out the required calculations. Usually linear algebra using diagonal matrices can be done with much less memory use, but you'll have to look closely at this code to see if that's the case here.