I am trying to subset a string from the back based on occurrence of underscores. Example follows:
string <- "trash_trash_trash_keep_keep_keep_trash.trash"
I am trying to substring from the last underscore till an nth occurrence. In this example, the desired output is:
"keep_keep_keep"
My attempt so far is messing up: '^(?:[^_]*){3}(. )_'
I think I should address the problem from the back, instead the start of the string.
Any input is appreciated.
CodePudding user response:
I think this regex
should do. It uses both a positive and a negative lookaround to isolate the correct fragment:
string <- "trash_trash_trash_keep_keep_keep_trash.trash"
stringr::str_extract(string, "(?<=^([^_]{0,999}_){3}). (?=_[^_]*$)")
# [1] "keep_keep_keep"
You can change the 3 in the regex, depending on how many underscores from the front you would like.
CodePudding user response:
This is similar to @Kat's approach in the comment but using a function to make it dynamic.
string <- "trash_trash_trash_keep_keep_keep_trash.trash"
return_last_n_words <- function(x, n) {
strsplit(x, '_')[[1]] |> head(-1) |> tail(n) |> paste0(collapse = "_")
}
return_last_n_words(string, 3)
#[1] "keep_keep_keep"
return_last_n_words(string, 4)
#[1] "trash_keep_keep_keep"
return_last_n_words(string, 2)
#[1] "keep_keep"
The idea is to split the string by underscore (_
), drop the last part, select last n
words and paste it in one string.