Home > Software engineering >  how to remove at once text/character from both sides of a given character/text (#regex)?
how to remove at once text/character from both sides of a given character/text (#regex)?

Time:03-24

What is the simplest way of removing text on both left and right side of a given character/text in r?

I have an example of the following dataset: a = c("C:\\final docs with data/Gakenke_New_Sanitation.xlsx", "C:\\final docs with data/Gatsibo_New_Sanitation.xlsx", "C:\\final docs with data/Rutsiro_New_Sanitation.xlsx")

My expected output is to remain with: Gakenke, Gatsibo and Rutsiro.

I know, I can breakdown this task and handle it using mutate() as the following:

a %>% mutate(a = str_remove(a, "C. /"), a = str_remove(a,"_. ")).

My question now is which simple pattern can I pass to that mutate function to remain with my intended results: Gakenke, Gatsibo and Rutsiro.

Any help is much appreciated. thank you!

CodePudding user response:

A possible solution, based on stringr::str_extract and lookaround:

library(tidyverse)

a %>% 
  str_extract("(?<=data\\/).*(?=\\_New)")

#> [1] "Gakenke" "Gatsibo" "Rutsiro"

CodePudding user response:

You can use

a = c("C:\\final docs with data/Gakenke_New_Sanitation.xlsx", "C:\\final docs with data/Gatsibo_New_Sanitation.xlsx",  "C:\\final docs with data/Rutsiro_New_Sanitation.xlsx")
library(stringr)
str_remove_all(a, "^.*/|_.*")
## => [1] "Gakenke" "Gatsibo" "Rutsiro"

The stringr::str_remove_all removes all occurrences of the found pattern. ^.*/|_.* matches a string from the start till the last / and then from the _ till end of the string (note the string is assumed to have no line break chars).

  • Related