Home > Software design >  How to correlate two variables in R only for specific observations
How to correlate two variables in R only for specific observations

Time:11-12

I'm wanting to correlate two variables in R but only for specific values of two other variables. For example, I want to look at the correlation between test scores (v1) and study time (v2) for people 16 or older who are female (gender=1).

In STATA, I would write something like this:

pwcorr v1 v2 if age>15 & gender==1

How would I do this in R? Is there a way to do this without creating a data subset?

CodePudding user response:

Here's an example of how to do this with the tidyverse for two variables from the mtcars dataset:

library(tidyverse)

dt <- mtcars
dt %>%
  dplyr::filter(cyl==4&hp<100) %>%
  dplyr::summarise(corMpgWt = cor(mpg,wt))

and with base R:

cor(dt[dt$cyl==4&dt$hp<100,][,c("mpg","wt")])

CodePudding user response:

Let's assume your data.frame is called df. You may rely on dplyr package and do the following:

library(dplyr)
  df %>% 
  filter(age > 15,
         gender == 1) %>% 
  select(V1, V2) %>% 
  cor

If you prefer using R base functions, here´s an approach:

cor(subset(df, age >15 & gender==1)[,c("V1", "V2")])
  •  Tags:  
  • r
  • Related