Home > Net >  Simulate effect of confounder on two random variables
Simulate effect of confounder on two random variables

Time:11-22

I want to generate some data in order to show partial correlation to control for a confounder.

Specifically, I want to generate data about two uncorrelated random variables (let's say speech and memory) and use a third variable to influence them both (age).

I'd expect to observe a strong correlation between speech and memory, due to the confounder age, and no correlation between the same two variables if I control for age (that is, calculate a partial correlation on age).

That said, I can't generate the strong correlation with my code.


age <- rep(1:10, 10)

speech <- age * abs(rnorm(100))
memory <- age * abs(rnorm(100))

cor(speech, memory) # correlation, it should be high but it's not

residuals_speech <- lm(speech ~ age)$residuals
residuals_memory <- lm(memory ~ age)$residuals

cor(residuals_speech, residuals_memory) # partial correlation controlling for age, it should be around zero

CodePudding user response:

The random noise needs to be additive, not multiplicative:

set.seed(1)

age <- rep(1:10, 10)

speech <- age   rnorm(100)
memory <- age   rnorm(100)

Now we have a high correlation between speech and memory:

cor(speech, memory)
#> [1] 0.9067772

But the correlation of the residuals once age is taken into account is close to zero, as expected.

residuals_speech <- lm(speech ~ age)$residuals
residuals_memory <- lm(memory ~ age)$residuals

cor(residuals_speech, residuals_memory) 
#> [1] 0.0001755437

To get speech and memory to be on different scales with different amounts of noise (as real data might be), you can multiply age and the random noise by scalar values. This will retain the high correlation caused by the confounder.

set.seed(1)

age <- rep(1:10, 10)

speech <- 5 * age   3 * rnorm(100)
memory <- 2.1 * age - 2 * rnorm(100)

cor(speech, memory)
#> [1] 0.9361466

residuals_speech <- lm(speech ~ age)$residuals
residuals_memory <- lm(memory ~ age)$residuals

cor(residuals_speech, residuals_memory) 
#> [1] -0.0001755437

Created on 2022-11-21 with reprex v2.0.2

  • Related