Home > Software engineering >  Regex matching a pattern and bounded strings
Regex matching a pattern and bounded strings

Time:11-24

I am trying to replace (technically delete) a level of folders from a vector of file path strings. The data look like this:

x<-c("d:/KeepItSimple/1234path21/WAVs/filename.wav",
     "d:/TryToKeepItSimple/5678path23/WAVs/filename2.wav")

I would like to use gsub and regex to find the pattern "path" in each string and replace the string containing "path" between the two forward slashes with nothing. Essentially just remove that folder level. You can assume that the number of characters naming that folder level will always contain "path" and will always be 10 characters long (between the slashes).

After some head scratching I came up with this:

 gsub(".{4}path.{2}", "", x)

It works, but it leaves me with two questions:

  1. Is there a better way to express/accomplish this in regex.
  2. How could I do it so it would look for everything to the first / before "path" and the next / after "path"?

CodePudding user response:

You may use sub here as follows:

x <- c("d:/KeepItSimple/1234path21/WAVs/filename.wav",
       "d:/TryToKeepItSimple/5678path23/WAVs/filename2.wav")
output <- sub("/?[^/]*path[^/]*/?", "/", x)
output

[1] "d:/KeepItSimple/WAVs/filename.wav"      
[2] "d:/TryToKeepItSimple/WAVs/filename2.wav"

CodePudding user response:

My understanding of the question is that the text to be replaced with an empty string must satisfy four requirements. It

  • is immediately preceded by a forward slash;
  • contains one forward slash that is at the end of the string;
  • contains 11 characters; and
  • contains the string "path"

We can match such a string with the following regular expression.

(?<=\/)(?=[^\/\n]{0,6}path)[^\/]{10}\/

Demo

The regex can be broken down as follows.

(?<=\/)         # positive lookbehind asserts match is preceded by a forward slash  
(?=             # begin positive lookahead
  [^\/\n]{0,6}  # match zero to 6 characters other than forward slashes and newlines
  path          # match literal
)               # end positive lookahead
[^\/\n]{10}     # match 10 characters other than forward slashes and newlines 
\/              # match a forward slash
  • Related