some previous questions have been asked on this topic, but they don't seem to include the case when a string contains multiple instances of the same delimiter.
How to extract substring between patterns "_" and "." in R
Extracting a string between other two strings in R
Extract a string between patterns/delimiters in R
The problem I am facing is the following. Say we have a vector like this:
vec <- c("Europe/Germany/Berlin/Mitte",
"Europe/Germany/Berlin/Charlottenburg",
"Europe/Croatia/Zagreb/Gornji Grad",
"Europe/Croatia/Zagreb/Donji Grad")
Can you provide me with the following two functions:
The output of the first function should be:
c("Germany", "Germany", "Croatia", "Croatia")
And the output of the second function should be:
c("Berlin", "Berlin", "Zagreb", "Zagreb")
I don't understand how the answers from previous questions apply when the delimiter /
appears more than once in the string and how can I specify which of the pieces I want.
CodePudding user response:
library(tidyverse)
get_name <- function(position) {
vec %>%
str_split("/") %>%
map_chr( ~ .x[position])
}
Get position 2
get_name(2)
[1] "Germany" "Germany" "Croatia" "Croatia"
Get position 3
get_name(3)
[1] "Berlin" "Berlin" "Zagreb" "Zagreb"
CodePudding user response:
We can try using sapply()
along with strsplit()
here for a base R solution:
unname(sapply(vec, function(x) unlist(strsplit(x, "/"))[2]))
[1] "Germany" "Germany" "Croatia" "Croatia"
unname(sapply(vec, function(x) unlist(strsplit(x, "/"))[3]))
[1] "Berlin" "Berlin" "Zagreb" "Zagreb"
CodePudding user response:
Here is another option. When you have a structured text like this, you can create four capture groups to encompass the text between the back slashes, then you can call the captured text by group:
vec <- c("Europe/Germany/Berlin/Mitte",
"Europe/Germany/Berlin/Charlottenburg",
"Europe/Croatia/Zagreb/Gornji Grad",
"Europe/Croatia/Zagreb/Donji Grad")
rgx <- "(.*)/(.*)/(.*)/(.*)"
sub(rgx, "\\1", vec)
#> [1] "Europe" "Europe" "Europe" "Europe"
sub(rgx, "\\2", vec)
#> [1] "Germany" "Germany" "Croatia" "Croatia"
sub(rgx, "\\3", vec)
#> [1] "Berlin" "Berlin" "Zagreb" "Zagreb"
sub(rgx, "\\4", vec)
#> [1] "Mitte" "Charlottenburg" "Gornji Grad" "Donji Grad"