I want to separate the complex names in my df1 after the second "_". Does anyone have any handy solution or suggestion on how to incorporate it into tidyR?
library(tidyverse)
df1 <- tibble(complex_names=c("King_Arthur_II", "Queen_Elizabeth_I", "King_Charles_III"),
year=c(970,1920,2022)
)
df1
#> # A tibble: 3 × 2
#> complex_names year
#> <chr> <dbl>
#> 1 King_Arthur_II 970
#> 2 Queen_Elizabeth_I 1920
#> 3 King_Charles_III 2022
df1 |>
separate(complex_names,into = c("name", "number"), sep="the second comma")
#> Error in into("name", "number"): could not find function "into"
Created on 2022-09-27 with reprex v2.0.2
I want my data to look like this
name number year
King_Arthur II 970
...
Any help or guidance is highly appreciated it.
CodePudding user response:
I'm no expert in regex but this answer shows the regular expression to find the second underscore. You can then use this regular expression in separate()
:
library(tidyverse)
df1 <- tibble(complex_names=c("King_Arthur_II", "Queen_Elizabeth_I", "King_Charles_III"),
year=c(970,1920,2022)
)
df1
#> # A tibble: 3 × 2
#> complex_names year
#> <chr> <dbl>
#> 1 King_Arthur_II 970
#> 2 Queen_Elizabeth_I 1920
#> 3 King_Charles_III 2022
df1 |>
separate(complex_names, into = c("name", "number"), sep = "(_)(?=[^_] $)")
#> # A tibble: 3 × 3
#> name number year
#> <chr> <chr> <dbl>
#> 1 King_Arthur II 970
#> 2 Queen_Elizabeth I 1920
#> 3 King_Charles III 2022
Created on 2022-09-27 with reprex v2.0.2
CodePudding user response:
You can also use extract
(which specifies the entire string, not only the separator):
library(tidyr)
df1 |>
extract(complex_names, c("name", "number"), "(.*)_([^_] )$")
output
# A tibble: 3 × 3
name number year
<chr> <chr> <dbl>
1 King_Arthur II 970
2 Queen_Elizabeth I 1920
3 King_Charles III 2022