Home > front end >  R regular expressions, separate column
R regular expressions, separate column

Time:11-06

How to separate a column into many, based on a symbol "|" :

 input <- tibble(A = c("Ae1 tt1 | Ae2 tt2", "Be1 | Be2 | Be3")) 

 output <-  tibble(B = c("Ae1 tt1", "Be1") , C = c("Ae2 tt2", "Be2"), D = c(NA, "Be3")) 

I tried :

 input %>% 
    separate(A, c("B","C","D"))
    #separate(A, c("B","C","D"), sep = "|.")
    #mutate(B = str_split(A, "*|")) %>% unnest

What is the syntax with regex ?

Thanks

CodePudding user response:

You can use

output <- input %>%
    separate(col=A, into=c("B","C","D"), sep="\\s*\\|\\s*", fill="right")

R test:

> input %>% separate(col=A, into=c("B","C","D"), sep="\\s*\\|\\s*", fill="right")
# A tibble: 2 x 3
  B       C       D    
  <chr>   <chr>   <chr>
1 Ae1 tt1 Ae2 tt2 <NA> 
2 Be1     Be2     Be3 

The \s*\|\s* pattern matches a pipe char with any zero or more whitespace chars on both ends of the pipe.

The fill="right" argument fills with missing values on the right.

  • Related