For example, I have two datasets (A,B), which have a score column, a location column (England or Wales) and a month column. If data set A only has the months January through to October, while data set B only has the months April - November, is there a way to filter my data to only include the months April-October? This is for pairing data in statistical tests.
My actual data set has over a hundred categorical variables, and maybe half don't match between groups so doing this manually isn't efficient in the least.
CodePudding user response:
Does this reproducible example capture what you want to do?
library(tidyverse)
dfa <- tribble(~location, ~month, ~a_score,
"England", 1, 1,
"England", 2, 1,
"England", 3, 1,
"Wales", 1, 1,
"Wales", 2, 1,
"Wales", 3, 1
)
dfb <- tribble(~location, ~month, ~b_score,
"England", 2, 2,
"England", 3, 2,
"England", 4, 2,
"Wales", 2, 2,
"Wales", 3, 2,
"Wales", 4, 2
)
dfa |> inner_join(dfb, by = c("location", "month"))
#> # A tibble: 4 × 4
#> location month a_score b_score
#> <chr> <dbl> <dbl> <dbl>
#> 1 England 2 1 2
#> 2 England 3 1 2
#> 3 Wales 2 1 2
#> 4 Wales 3 1 2
Created on 2022-05-16 by the reprex package (v2.0.1)