I'm cleaning some data where there are multiple columns that need to be split into rows with both ',' and '/'. Data table below to explain what it the source code looks like.
df <- data.table(
b = c("a", "d/e/f", "g,h"),
c = c("1", "2,3,4", "5/6")
)
I've tried using separate_rows, but it can only split one column on one of these separators at a time.
EDIT: The data table I'm looking for looks approximately like this:
df_clean <- data.table(
b = c("a", "d", "d", "d",
"e", "e", "e", "f",
"f", "f", "g", "g",
"h", "h"),
c = c("1", "2", "3", "4",
"2", "3", "4",
"2", "3", "4",
"5", "6",
"5", "6")
)
CodePudding user response:
Updated answer based on added clarification.
Run separate_rows
once on each column to get all of the permutations. You can use a regex pattern to specify multiple separators.
library(tidyr)
df %>%
separate_rows(b, sep = '/|,') %>%
separate_rows(c, sep = '/|,')
#> # A tibble: 14 × 2
#> b c
#> <chr> <chr>
#> 1 a 1
#> 2 d 2
#> 3 d 3
#> 4 d 4
#> 5 e 2
#> 6 e 3
#> 7 e 4
#> 8 f 2
#> 9 f 3
#> 10 f 4
#> 11 g 5
#> 12 g 6
#> 13 h 5
#> 14 h 6
CodePudding user response:
Maybe this helps: [https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows][1]
for the first column:
s <- strsplit(df$b, split = c(",","/"))
data.frame(a = rep(df$a, sapply(s, length)), b = unlist(s))