Home > Enterprise >  Separate Multiple Columns by "/" and ","
Separate Multiple Columns by "/" and ","

Time:12-01

I'm cleaning some data where there are multiple columns that need to be split into rows with both ',' and '/'. Data table below to explain what it the source code looks like.

df <- data.table(
   b = c("a", "d/e/f", "g,h"),
     c = c("1", "2,3,4", "5/6")
   )

I've tried using separate_rows, but it can only split one column on one of these separators at a time.

EDIT: The data table I'm looking for looks approximately like this:

df_clean <- data.table(
  b = c("a", "d", "d", "d", 
        "e", "e", "e", "f", 
        "f", "f", "g", "g",
        "h", "h"),
  c = c("1", "2", "3", "4",
        "2", "3", "4",
        "2", "3", "4",
        "5", "6", 
        "5", "6")
)

CodePudding user response:

Updated answer based on added clarification.

Run separate_rows once on each column to get all of the permutations. You can use a regex pattern to specify multiple separators.

library(tidyr)

df %>%
  separate_rows(b, sep = '/|,') %>%
  separate_rows(c, sep = '/|,')

#> # A tibble: 14 × 2
#>    b     c    
#>    <chr> <chr>
#>  1 a     1    
#>  2 d     2    
#>  3 d     3    
#>  4 d     4    
#>  5 e     2    
#>  6 e     3    
#>  7 e     4    
#>  8 f     2    
#>  9 f     3    
#> 10 f     4    
#> 11 g     5    
#> 12 g     6    
#> 13 h     5    
#> 14 h     6

CodePudding user response:

Maybe this helps: [https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows][1]

for the first column:

s <- strsplit(df$b, split = c(",","/"))
data.frame(a = rep(df$a, sapply(s, length)), b = unlist(s))
  •  Tags:  
  • r
  • Related