How to separate a column into many, based on a symbol "|" :
input <- tibble(A = c("Ae1 tt1 | Ae2 tt2", "Be1 | Be2 | Be3"))
output <- tibble(B = c("Ae1 tt1", "Be1") , C = c("Ae2 tt2", "Be2"), D = c(NA, "Be3"))
I tried :
input %>%
separate(A, c("B","C","D"))
#separate(A, c("B","C","D"), sep = "|.")
#mutate(B = str_split(A, "*|")) %>% unnest
What is the syntax with regex ?
Thanks
CodePudding user response:
You can use
output <- input %>%
separate(col=A, into=c("B","C","D"), sep="\\s*\\|\\s*", fill="right")
R test:
> input %>% separate(col=A, into=c("B","C","D"), sep="\\s*\\|\\s*", fill="right")
# A tibble: 2 x 3
B C D
<chr> <chr> <chr>
1 Ae1 tt1 Ae2 tt2 <NA>
2 Be1 Be2 Be3
The \s*\|\s*
pattern matches a pipe char with any zero or more whitespace chars on both ends of the pipe.
The fill="right"
argument fills with missing values on the right.