Home > Back-end >  Adding new variables to existing data that correlate with one or two existing ones
Adding new variables to existing data that correlate with one or two existing ones

Time:03-13

How can I add two more variables with the following conditions?

  1. Variable "c" that has a 0.7 correlation with variable "a".
  2. If possible, variable "d" that correlates simultaneously with "a" and "b".

Simulated data

n = 100

d = tibble(a = rnorm(n, 50, 20),
           b = rnorm(n, 10, 0.4))

d

enter image description here

Tidyverse solutions are much appreciated!

CodePudding user response:

Here is a small function that take a vector x and a desired rho and returns a vector such that cor(<vector>,x) == rho`).

f <- function(x,rho) {
  orth = lm(runif(length(x))~x)$residuals
  rho*sd(orth)*x   orth*sd(x)*sqrt(1-rho^2)
}

Now we apply the function to column a to create a column c such that cor(a,c) == 0.7

d %>%  mutate(c = f(a,.7))

CodePudding user response:

The second is actually more easier (for me at least): just make z-scores out of both a and b and add or average them. It will correlate with both a and b with 0.7

d <- d %>% 
  mutate(d=((a - mean(a)) / sd(a))  
           ((b- mean(b)) / sd(b)))

CodePudding user response:

Using the Iman and Conover method (1982) developed in the mc2d package (a rank correlation structure).

library(mc2d)
cc <- rnorm(n,50,20)
cc <- cornode(cbind(d$a,cc), target=0.7)[,"cc"]
d$c <- cc
cor(d)

For more than one variable, you have to build a matrix of correlation.

## Target
(corrTarget <- matrix(c(1, 0.7, 0.7, 0.7, 1, 0.7, 0.7, 0.7, 1), ncol=3))

dd <- rnorm(n,50,20)
dd <- cornode(cbind(a=d$a,b=d$b,dd), target=corrTarget)
cor(dd)
d$b <- dd[,"b"]
d$d <- dd[,"dd"]

cor(d)

The final correlation structure should be checked because it is not always possible to build the target correlation structure.

  • Related