Home > Software engineering >  Separate over several delimiting factors in R tidyR
Separate over several delimiting factors in R tidyR

Time:09-27

I want to separate the complex names in my df1 after the second "_". Does anyone have any handy solution or suggestion on how to incorporate it into tidyR?

library(tidyverse)

df1 <- tibble(complex_names=c("King_Arthur_II", "Queen_Elizabeth_I", "King_Charles_III"), 
       year=c(970,1920,2022)
)
df1 
#> # A tibble: 3 × 2
#>   complex_names      year
#>   <chr>             <dbl>
#> 1 King_Arthur_II      970
#> 2 Queen_Elizabeth_I  1920
#> 3 King_Charles_III   2022

df1 |> 
separate(complex_names,into = c("name", "number"), sep="the second comma")
#> Error in into("name", "number"): could not find function "into"

Created on 2022-09-27 with reprex v2.0.2

I want my data to look like this

name           number  year 
King_Arthur      II     970
...

Any help or guidance is highly appreciated it.

CodePudding user response:

I'm no expert in regex but this answer shows the regular expression to find the second underscore. You can then use this regular expression in separate():

library(tidyverse)

df1 <- tibble(complex_names=c("King_Arthur_II", "Queen_Elizabeth_I", "King_Charles_III"), 
              year=c(970,1920,2022)
)
df1 
#> # A tibble: 3 × 2
#>   complex_names      year
#>   <chr>             <dbl>
#> 1 King_Arthur_II      970
#> 2 Queen_Elizabeth_I  1920
#> 3 King_Charles_III   2022


df1 |> 
  separate(complex_names, into = c("name", "number"), sep = "(_)(?=[^_] $)")
#> # A tibble: 3 × 3
#>   name            number  year
#>   <chr>           <chr>  <dbl>
#> 1 King_Arthur     II       970
#> 2 Queen_Elizabeth I       1920
#> 3 King_Charles    III     2022

Created on 2022-09-27 with reprex v2.0.2

CodePudding user response:

You can also use extract (which specifies the entire string, not only the separator):

library(tidyr)
df1 |>
  extract(complex_names, c("name", "number"), "(.*)_([^_] )$")

output

# A tibble: 3 × 3
  name            number  year
  <chr>           <chr>  <dbl>
1 King_Arthur     II       970
2 Queen_Elizabeth I       1920
3 King_Charles    III     2022
  • Related