How can I add two more variables with the following conditions?
- Variable "c" that has a 0.7 correlation with variable "a".
- If possible, variable "d" that correlates simultaneously with "a" and "b".
Simulated data
n = 100
d = tibble(a = rnorm(n, 50, 20),
b = rnorm(n, 10, 0.4))
d
Tidyverse solutions are much appreciated!
CodePudding user response:
Here is a small function that take a vector x
and a desired rho
and returns a vector such that cor(<vector>,x) ==
rho`).
f <- function(x,rho) {
orth = lm(runif(length(x))~x)$residuals
rho*sd(orth)*x orth*sd(x)*sqrt(1-rho^2)
}
Now we apply the function to column a
to create a column c
such that cor(a,c) == 0.7
d %>% mutate(c = f(a,.7))
CodePudding user response:
The second is actually more easier (for me at least): just make z-scores out of both a and b and add or average them. It will correlate with both a and b with 0.7
d <- d %>%
mutate(d=((a - mean(a)) / sd(a))
((b- mean(b)) / sd(b)))
CodePudding user response:
Using the Iman and Conover method (1982) developed in the mc2d package (a rank correlation structure).
library(mc2d)
cc <- rnorm(n,50,20)
cc <- cornode(cbind(d$a,cc), target=0.7)[,"cc"]
d$c <- cc
cor(d)
For more than one variable, you have to build a matrix of correlation.
## Target
(corrTarget <- matrix(c(1, 0.7, 0.7, 0.7, 1, 0.7, 0.7, 0.7, 1), ncol=3))
dd <- rnorm(n,50,20)
dd <- cornode(cbind(a=d$a,b=d$b,dd), target=corrTarget)
cor(dd)
d$b <- dd[,"b"]
d$d <- dd[,"dd"]
cor(d)
The final correlation structure should be checked because it is not always possible to build the target correlation structure.