Substring: From a Certain Position to End-CodePudding

I have this list ("sample_list");

[1] "http://www.website.ca/extra/city1-aaa-bbb-ccc/"    
[2] "http://www.website.ca/extra/acity2-a2a-bbb-ccc/"   
[3] "http://www.website.ca/extra/bbcity3-a3a-bbb-ccc/"   
[4] "http://www.website.ca/extra/ccccity4-a77a-bbb-ccc/"   
[5] "http://www.website.ca/extra/dddddcity5-a2a-bbb-ccc/"

I want to extract the following parts from this list: city1, acity2, bbcity3, ccccity4, dddddcity5

I had the following idea about this. I noticed that for all elements in this list, the first position is always the same position "http://www.website.ca/extra/ (29th position).

my_substr = substr(sample_list, 1,29)

Is there someway I can modify the sustring function so that everything is selected from the 29th position all the way to the first hyphen?

Thank you!

CodePudding user response：

x = c("http://www.website.ca/extra/city1-aaa-bbb-ccc/", "http://www.website.ca/extra/acity2-aaa-bbb-ccc/", 
"http://www.website.ca/extra/bbcity3-aaa-bbb-ccc/", "http://www.website.ca/extra/ccccity4-aaa-bbb-ccc/", 
"http://www.website.ca/extra/dddddcity5-aaa-bbb-ccc/")

From the 29th position all the way to the first hyphen? Yes,

substring(x, 29, stringr::str_locate(x, "-")[,1] - 1)

although other options exist for such task. Depending on preference, this might be more suitable.

stringr::str_extract(x, "(?<=extra/).*(?=-aaa-)")

CodePudding user response：

Simply use str_extract from stringr package

library(stringr)

strings <- c(
    "http://www.website.ca/extra/city1-aaa-bbb-ccc/",
    "http://www.website.ca/extra/acity2-aaa-bbb-ccc/",
    "http://www.website.ca/extra/bbcity3-aaa-bbb-ccc/",
    "http://www.website.ca/extra/ccccity4-aaa-bbb-ccc/",
    "http://www.website.ca/extra/dddddcity5-aaa-bbb-ccc/"
)

str_extract(strings, "(?<=extra\\/)\\w (?=-aaa-)")
#> [1] "city1"      "acity2"     "bbcity3"    "ccccity4"   "dddddcity5"

^{Created on 2022-07-07 by the reprex package (v2.0.1)}