I'm wanting to correlate two variables in R but only for specific values of two other variables. For example, I want to look at the correlation between test scores (v1) and study time (v2) for people 16 or older who are female (gender=1).
In STATA, I would write something like this:
pwcorr v1 v2 if age>15 & gender==1
How would I do this in R? Is there a way to do this without creating a data subset?
CodePudding user response:
Here's an example of how to do this with the tidyverse
for two variables from the mtcars
dataset:
library(tidyverse)
dt <- mtcars
dt %>%
dplyr::filter(cyl==4&hp<100) %>%
dplyr::summarise(corMpgWt = cor(mpg,wt))
and with base R:
cor(dt[dt$cyl==4&dt$hp<100,][,c("mpg","wt")])
CodePudding user response:
Let's assume your data.frame is called df
. You may rely on dplyr package and do the following:
library(dplyr)
df %>%
filter(age > 15,
gender == 1) %>%
select(V1, V2) %>%
cor
If you prefer using R base functions, here´s an approach:
cor(subset(df, age >15 & gender==1)[,c("V1", "V2")])