I have a dataset like this:
structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019),
venue = c("Z", "Z", "Z", "Z", "O", "D"), HO = c("X", "Y",
"X", "Y", "W", "J"), AW = c("Y", "X", "W", "T", "T", "X")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
I need to create a categorical variable that returns "yes" or "no".
structure(list(year = c(2019, 2019, 2019, 2019, 2019, 2019),
venue = c("Z", "Z", "Z", "Z", "O", "D"), HO = c("X", "Y",
"X", "Y", "W", "J"), AW = c("Y", "X", "W", "T", "T", "X"),
Cat = c("yes", "yes", "no", "no", "no", "no")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
As note, the first two rows are cases for "yes". These are the same in the attributes "year" and "venue", regardless of the order in "HO" and "AW".
Thanks.
CodePudding user response:
You can first group your dataframe by row using rowwise()
, then paste
the HO
and AW
columns together. This step is to see which rows have the same HO
and AW
regardless of order. Then group_by
the unique combination to categorise them.
library(dplyr)
library(stringr)
df %>%
rowwise() %>%
mutate(paste_col = paste0(sort(str_split(paste0(HO, AW), "", simplify = T)), collapse = ",")) %>%
group_by(year, venue, paste_col) %>%
mutate(Cat = ifelse(n() > 1, "Yes", "No")) %>%
ungroup() %>%
select(-paste_col)
# A tibble: 6 × 5
year venue HO AW Cat
<dbl> <chr> <chr> <chr> <chr>
1 2019 Z X Y Yes
2 2019 Z Y X Yes
3 2019 Z X W No
4 2019 Z Y T No
5 2019 O W T No
6 2019 D J X No