What is the simplest way of removing text on both left and right side of a given character/text in r
?
I have an example of the following dataset:
a = c("C:\\final docs with data/Gakenke_New_Sanitation.xlsx", "C:\\final docs with data/Gatsibo_New_Sanitation.xlsx", "C:\\final docs with data/Rutsiro_New_Sanitation.xlsx")
My expected output is to remain with: Gakenke, Gatsibo and Rutsiro.
I know, I can breakdown this task and handle it using mutate()
as the following:
a %>% mutate(a = str_remove(a, "C. /"), a = str_remove(a,"_. "))
.
My question now is which simple pattern
can I pass to that mutate function to remain with my intended results: Gakenke, Gatsibo and Rutsiro.
Any help is much appreciated. thank you!
CodePudding user response:
A possible solution, based on stringr::str_extract
and lookaround:
library(tidyverse)
a %>%
str_extract("(?<=data\\/).*(?=\\_New)")
#> [1] "Gakenke" "Gatsibo" "Rutsiro"
CodePudding user response:
You can use
a = c("C:\\final docs with data/Gakenke_New_Sanitation.xlsx", "C:\\final docs with data/Gatsibo_New_Sanitation.xlsx", "C:\\final docs with data/Rutsiro_New_Sanitation.xlsx")
library(stringr)
str_remove_all(a, "^.*/|_.*")
## => [1] "Gakenke" "Gatsibo" "Rutsiro"
The stringr::str_remove_all
removes all occurrences of the found pattern. ^.*/|_.*
matches a string from the start till the last /
and then from the _
till end of the string (note the string is assumed to have no line break chars).