I have a df as below, and I could like to get for each ID what is in subject 1 but not in subject 2
, and what is in subject 2 but not in subject 1
. Any suggestion/
df <- structure(list(ID = c("Tom", "Jerry", "Marry"), Subject1 = c("Art; Math",
"ELA;Math", "PE; Math; ELA"), Subject2 = c("Math; PE", "Math; ELA",
"Math; Bio")), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
CodePudding user response:
We could split the columns and use map2
to find the difference (setdiff
) between those two list columns
library(dplyr)
library(purrr)
library(tidyr)
library(stringr)
df %>%
mutate(In = map2(
strsplit(Subject1, ";\\s*"),
strsplit(Subject2, ";\\s*"),
~ tibble(`_1_notin_2` = str_c(setdiff(.x, .y), collapse = "; "),
`_2_notin_1` = str_c(setdiff(.y, .x), collapse = "; ")))) %>%
unnest_wider(In, names_sep = "")
-output
# A tibble: 3 × 5
ID Subject1 Subject2 In_1_notin_2 In_2_notin_1
<chr> <chr> <chr> <chr> <chr>
1 Tom Art; Math Math; PE "Art" "PE"
2 Jerry ELA;Math Math; ELA "" ""
3 Marry PE; Math; ELA Math; Bio "PE; ELA" "Bio"
CodePudding user response:
It could also be a use-case for rowwise()
:
library(dplyr)
df |>
rowwise() |>
mutate(across(starts_with("S"), ~ strsplit(., ";\\s*")),
In_1_notin_2 = paste(setdiff(Subject1, Subject2), collapse = "; "),
In_2_notin_1 = paste(setdiff(Subject2, Subject1), collapse = "; "),
across(starts_with("S"), ~ paste(., collapse = "; "))) |>
ungroup()
Output:
# A tibble: 3 × 5
ID Subject1 Subject2 In_1_notin_2 In_2_notin_1
<chr> <chr> <chr> <chr> <chr>
1 Tom Art; Math Math; PE "Art" "PE"
2 Jerry ELA; Math Math; ELA "" ""
3 Marry PE; Math; ELA Math; Bio "PE; ELA" "Bio"