I have variables that are multi-select in my data frame. They look like this:
ID Titles
1 Title 1, Title 2, Title 3
2 Title 4, Title 6, Title 1
3 Title 2, Title 5, Title 3
I want to split the multi-select variable "Title" into separate columns based on the title names, like this:
ID Titles Title 1 Title 2 Title 3 Title 4 Title 5 Title 6
1 Title 1, Title 2, Title 3 Title 1 Title 2 Title 3
2 Title 4, Title 6, Title 1 Title 1 Title 4 Title 6
3 Title 2, Title 5, Title 3 Title 2 Title 3 Title 5
Is there a way that I can split strings (by comma) and have the output but separate columns with each title organized by their respective names?
Any help would be great!
I can only split my multi-select item using this function: str_split_fixed, but i don't know how to get my strings organized by their names because it is not in order in the original multi-select variable.
CodePudding user response:
Data
dat <- structure(list(ID = 1:3,
Titles = c("Title1,Title2,Title3", "Title4,Title6,Title1","Title2,Title5,Title3")),
class = "data.frame", row.names = c(NA,-3L))
Code
dat %>%
full_join(
dat %>%
separate_rows(Titles,sep = ",") %>%
count(ID,Titles) %>%
pivot_wider(names_from = Titles,values_from = Titles)
) %>%
select(ID,Titles,paste0("Title",1:6))
Output
ID Titles Title1 Title2 Title3 Title4 Title5 Title6
1 1 Title1,Title2,Title3 Title1 Title2 Title3 <NA> <NA> <NA>
2 2 Title4,Title6,Title1 Title1 <NA> <NA> Title4 <NA> Title6
3 3 Title2,Title5,Title3 <NA> Title2 Title3 <NA> Title5 <NA>
CodePudding user response:
We may use dummy_cols
from fastDummies
to create binary columns and then replace the 1s with the column name
library(dplyr)
library(fastDummies)
library(stringr)
dummy_cols(df1, "Titles", split = ",\\s*") %>%
rename_with(~ str_remove(.x, ".*_")) %>%
mutate(across(-(1:2), ~ case_when(.x == 1 ~ cur_column())))
-output
ID Titles Title1 Title2 Title3 Title5 Title4 Title6
1 1 Title1,Title2,Title3 Title1 Title2 Title3 <NA> <NA> <NA>
2 2 Title4,Title6,Title1 Title1 <NA> <NA> <NA> Title4 Title6
3 3 Title2,Title5,Title3 <NA> Title2 Title3 Title5 <NA> <NA>
Or may also do
library(purrr)
library(tibble)
library(tidyr)
df1 %>%
mutate(Title = map(strsplit(Titles, ",\\s*"),
~ as_tibble_row(setNames(.x, .x)))) %>%
unnest(Title)
-output
# A tibble: 3 × 8
ID Titles Title1 Title2 Title3 Title4 Title6 Title5
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Title1,Title2,Title3 Title1 Title2 Title3 <NA> <NA> <NA>
2 2 Title4,Title6,Title1 Title1 <NA> <NA> Title4 Title6 <NA>
3 3 Title2,Title5,Title3 <NA> Title2 Title3 <NA> <NA> Title5
data
df1 <- structure(list(ID = 1:3, Titles = c("Title1,Title2,Title3", "Title4,Title6,Title1",
"Title2,Title5,Title3")), class = "data.frame",
row.names = c(NA,
-3L))