Home > Software engineering >  Split variable from comma into an ordered dataframe
Split variable from comma into an ordered dataframe

Time:07-11

I have a dataframe like this, where the values are separated by comma.

# Events
# A,B,C
# C,D
# B,A
# D,B,A,E
# A,E,B

I would like to have the next data frame

# Event1  Event2  Event3  Event4  Event5
# A       B       C       NA      NA
# NA      NA      C       NA      NA
# A       B       NA      NA      NA
# A       B       NA      D       E
# A       B       NA      NA      E

I have tried with cSplit but I don't have the desired df. Is possible?

NOTE: The values doesn't appear in the same possition as the variable Event in the second dataframe.

CodePudding user response:

Here is a base R solution. split each row giving list s and create cols which contains the possible values. Then iterate over s and convert that to a data frame.

s <- strsplit(DF$Events, ",")
cols <- unique(sort(unlist(s)))

data.frame(Event = t(sapply(s, function(x) ifelse(cols %in% x, cols, NA))))

giving:

  Event.1 Event.2 Event.3 Event.4 Event.5
1       A       B       C    <NA>    <NA>
2    <NA>    <NA>       C       D    <NA>
3       A       B    <NA>    <NA>    <NA>
4       A       B    <NA>       D       E
5       A       B    <NA>    <NA>       E

Note

The input in reproducible form:

Lines <- "Events
A,B,C
C,D
B,A
D,B,A,E
A,E,B"
DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE)

CodePudding user response:

Another approach using tidyverse:

library(dplyr)
library(purrr)
library(stringr)

Events = c("A,B,C", 'C,D', "B,A", "D,B,A,E", "A,E,B")

letters <- Events %>% str_split(",") %>% unlist() %>% unique()

df <- data.frame(Events)

df %>% 
  map2_dfc(.y = letters, ~ ifelse(str_detect(.x, .y), .y, NA)) %>% 
  set_names(nm = paste0("Events", 1:length(letters)))

#> # A tibble: 5 × 5
#>   Events1 Events2 Events3 Events4 Events5
#>   <chr>   <chr>   <chr>   <chr>   <chr>  
#> 1 A       B       C       <NA>    <NA>   
#> 2 <NA>    <NA>    C       D       <NA>   
#> 3 A       B       <NA>    <NA>    <NA>   
#> 4 A       B       <NA>    D       E      
#> 5 A       B       <NA>    <NA>    E

Created on 2022-07-11 by the reprex package (v2.0.1)

  • Related