Home > Software design >  R - how to extract a string between two delimiters when there are multiple instances of the same del
R - how to extract a string between two delimiters when there are multiple instances of the same del

Time:11-13

some previous questions have been asked on this topic, but they don't seem to include the case when a string contains multiple instances of the same delimiter.

How to extract substring between patterns "_" and "." in R

Extracting a string between other two strings in R

Extract a string between patterns/delimiters in R

The problem I am facing is the following. Say we have a vector like this:

vec <- c("Europe/Germany/Berlin/Mitte", 
         "Europe/Germany/Berlin/Charlottenburg", 
         "Europe/Croatia/Zagreb/Gornji Grad", 
         "Europe/Croatia/Zagreb/Donji Grad")

Can you provide me with the following two functions:

The output of the first function should be:

c("Germany", "Germany", "Croatia", "Croatia")

And the output of the second function should be:

c("Berlin", "Berlin", "Zagreb", "Zagreb")

I don't understand how the answers from previous questions apply when the delimiter / appears more than once in the string and how can I specify which of the pieces I want.

CodePudding user response:

library(tidyverse)

get_name <- function(position) {
  vec %>%
    str_split("/") %>%
    map_chr( ~ .x[position])
}

Get position 2

get_name(2)
[1] "Germany" "Germany" "Croatia" "Croatia"

Get position 3

get_name(3)
[1] "Berlin" "Berlin" "Zagreb" "Zagreb"

CodePudding user response:

We can try using sapply() along with strsplit() here for a base R solution:

unname(sapply(vec, function(x) unlist(strsplit(x, "/"))[2]))

[1] "Germany" "Germany" "Croatia" "Croatia"

unname(sapply(vec, function(x) unlist(strsplit(x, "/"))[3]))

[1] "Berlin" "Berlin" "Zagreb" "Zagreb"

CodePudding user response:

Here is another option. When you have a structured text like this, you can create four capture groups to encompass the text between the back slashes, then you can call the captured text by group:

vec <- c("Europe/Germany/Berlin/Mitte", 
         "Europe/Germany/Berlin/Charlottenburg", 
         "Europe/Croatia/Zagreb/Gornji Grad", 
         "Europe/Croatia/Zagreb/Donji Grad")

rgx <- "(.*)/(.*)/(.*)/(.*)"

sub(rgx, "\\1", vec)
#> [1] "Europe" "Europe" "Europe" "Europe"
sub(rgx, "\\2", vec)
#> [1] "Germany" "Germany" "Croatia" "Croatia"
sub(rgx, "\\3", vec)
#> [1] "Berlin" "Berlin" "Zagreb" "Zagreb"
sub(rgx, "\\4", vec)
#> [1] "Mitte"          "Charlottenburg" "Gornji Grad"    "Donji Grad"
  •  Tags:  
  • r
  • Related