Home > Blockchain >  Is there an R function to split multi-select and organize into columns based on text?
Is there an R function to split multi-select and organize into columns based on text?

Time:01-13

I have variables that are multi-select in my data frame. They look like this:

ID Titles
1  Title 1, Title 2, Title 3
2  Title 4, Title 6, Title 1
3  Title 2, Title 5, Title 3

I want to split the multi-select variable "Title" into separate columns based on the title names, like this:

ID Titles                     Title 1  Title 2  Title 3  Title 4  Title 5  Title 6
1  Title 1, Title 2, Title 3  Title 1  Title 2  Title 3 
2  Title 4, Title 6, Title 1  Title 1                    Title 4           Title 6
3  Title 2, Title 5, Title 3           Title 2  Title 3           Title 5

Is there a way that I can split strings (by comma) and have the output but separate columns with each title organized by their respective names?

Any help would be great!

I can only split my multi-select item using this function: str_split_fixed, but i don't know how to get my strings organized by their names because it is not in order in the original multi-select variable.

CodePudding user response:

Data

dat <- structure(list(ID = 1:3,
                      Titles = c("Title1,Title2,Title3", "Title4,Title6,Title1","Title2,Title5,Title3")),
                 class = "data.frame", row.names = c(NA,-3L))

Code

dat %>% 
  full_join(
    dat %>% 
      separate_rows(Titles,sep = ",") %>% 
      count(ID,Titles) %>% 
      pivot_wider(names_from = Titles,values_from = Titles)
  ) %>% 
  select(ID,Titles,paste0("Title",1:6))

Output

  ID               Titles Title1 Title2 Title3 Title4 Title5 Title6
1  1 Title1,Title2,Title3 Title1 Title2 Title3   <NA>   <NA>   <NA>
2  2 Title4,Title6,Title1 Title1   <NA>   <NA> Title4   <NA> Title6
3  3 Title2,Title5,Title3   <NA> Title2 Title3   <NA> Title5   <NA>

CodePudding user response:

We may use dummy_cols from fastDummies to create binary columns and then replace the 1s with the column name

library(dplyr)
library(fastDummies)
library(stringr)
 dummy_cols(df1, "Titles", split = ",\\s*") %>%
  rename_with(~ str_remove(.x, ".*_")) %>%
  mutate(across(-(1:2), ~ case_when(.x == 1 ~ cur_column())))

-output

  ID               Titles Title1 Title2 Title3 Title5 Title4 Title6
1  1 Title1,Title2,Title3 Title1 Title2 Title3   <NA>   <NA>   <NA>
2  2 Title4,Title6,Title1 Title1   <NA>   <NA>   <NA> Title4 Title6
3  3 Title2,Title5,Title3   <NA> Title2 Title3 Title5   <NA>   <NA>

Or may also do

library(purrr)
library(tibble)
library(tidyr)
df1 %>% 
   mutate(Title = map(strsplit(Titles, ",\\s*"),
      ~ as_tibble_row(setNames(.x, .x)))) %>% 
     unnest(Title)

-output

# A tibble: 3 × 8
     ID Titles               Title1 Title2 Title3 Title4 Title6 Title5
  <int> <chr>                <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
1     1 Title1,Title2,Title3 Title1 Title2 Title3 <NA>   <NA>   <NA>  
2     2 Title4,Title6,Title1 Title1 <NA>   <NA>   Title4 Title6 <NA>  
3     3 Title2,Title5,Title3 <NA>   Title2 Title3 <NA>   <NA>   Title5

data

df1 <- structure(list(ID = 1:3, Titles = c("Title1,Title2,Title3", "Title4,Title6,Title1", 
"Title2,Title5,Title3")), class = "data.frame", 
row.names = c(NA, 
-3L))
  • Related