Generate two skew-normal variables with specified correlation-CodePudding

I want to add a predetermined correlation between two skew-normal variables generated with fGarch::rsnorm(). The variables also need to be positive with a predetermined mean, variance, and skewness.

Here is my attempt:

correlation <- 0.3
mean_1 <- 20
mean_2 <- 10
sd_1 <- 10
sd_2 <- 10
xi_1 <- 10
xi_2 <- 10
n <- 10000

x <- abs(fGarch::rsnorm(n, mean = mean_1, sd = sd_1, xi = xi_1))
y <- abs(fGarch::rsnorm(n, mean = mean_2, sd = sd_2, xi = xi_2))

x <- abs(x   correlation * (y - mean(y)))
y <- abs(y   correlation * (x - mean(x)))

hist(x)

hist(y)

cor(x,y)
#> [1] 0.4941751

^{Created on 2023-01-13 with reprex v2.0.2}

CodePudding user response：

Let's recreate your example but with a reproducible random seed

set.seed(1)

correlation <- 0.3
mean_1 <- 20
mean_2 <- 10
sd_1 <- 10
sd_2 <- 10
xi_1 <- 10
xi_2 <- 10
n <- 10000

x <- abs(fGarch::rsnorm(n, mean = mean_1, sd = sd_1, xi = xi_1))
y <- abs(fGarch::rsnorm(n, mean = mean_2, sd = sd_2, xi = xi_2))

Now of course x and y are effectively completely uncorrelated:

cor(x, y)
#> [1] -0.003209725

But if we sort both vectors, they become almost perfectly correlated.

x <- sort(x)
y <- sort(y)

cor(x, y)
#> [1] 0.9967296

But we want something in between, so let's shuffle x randomly, one pair at a time, and keep doing this until the correlation just falls below our target correlation:

while(cor(x, y) > correlation) {
  ij <- sample(n, 2)
  tmpx <- x[ij[1]]
  x[ij[1]] <- x[ij[2]]
  x[ij[2]] <- tmpx
}

Of course, we haven't changed the actual value of x or y, so their parameters are unchanged:

hist(x)


hist(y)


mean(x)
#> [1] 19.87497
mean(y)
#> [1] 10.60162

But now their correlation matches our target value:

cor(x, y)
#> [1] 0.2997019

^{Created on 2023-01-13 with reprex v2.0.2}

CodePudding user response：

Here's a starting point: to get this to work properly I'd have to spend more time reading the documentation and the relevant papers to figure out what the parameterizations are.

Start with library("sos"); findFn("{multivariate skew-normal}") to find the sn package. Then look at the examples in ?rmsn. This leads us to ?cp2dp:

For a multivariate distribution, there exists an extension based on the same logic; its components represent the vector mean value, the variance matrix, the vector of marginal coefficients of skewness ...

mean_v <- c(20, 10)
sd_v <- c(10, 10)
correlation <- 0.3
## from SD, cor to covariance matrix
vv <- outer(sd_v, sd_v) * matrix(c(1, correlation, correlation, 1), 2, 2) 
xi_v <- c(10, 10)
n <- 10000
## note different notation in sn vs fGarch: xi is location
## n.b. I haven't checked that sn 'alpha' is actually
##     equivalent to fGarch 'xi' ...
op_pars <- list(xi=mean_v, Omega=vv, alpha=xi_v)
dp_pars <- op2dp(op_pars, family = "SN")
set.seed(101)
y <- rmsn(n = n, dp = dp_pars)
colMeans(y)