Home > front end >  Regex to match if string contains 4 or more backslashes
Regex to match if string contains 4 or more backslashes

Time:01-04

Hi very new to regex but am struggling to make this work.

I have a variable in Data Studio that contains the page URLs. I need to keep only a subset of pages in a table - those that contain 4 or more backslashes:

For example, these URLs should be in the subset:

  • /abc/state/region/place1
  • /abc/state/region/place1/details
  • /abc/territory/region/place2
  • /abc/state/region/place3/details/more-specific

Whereas these URLs should be excluded:

  • /abc/state/region
  • /abc
  • /abc/xyz-page

I thought something like \/{4,} would work but it doesn't seem to return any results

CodePudding user response:

First, those are slashes; backslashes go the other way (\). It's important because backslashes are used in all sorts of syntactic ways in programming languages, including regular expressions: putting a backslash in front of a special character changes its meaning between whatever special function it has and just matching itself normally. In regexes, / is not special, so you don't have to put a backslash in front of it (unless you're using a language where the regex itself is delimited by /s on the outside of the whole thing, but since you're using Google Data Studio I'm pretty sure regexes are entered as plain strings.)

Second, the regex /{4,} only matches four or more slashes with nothing else between them. So it matches ////, but not ///hello/, etc. Everything in the string – or at least, everything in the part of the string matched by the regex – has to be accounted for in the regex. Perhaps you could try something like this:

(?:[^/]*/[^/]*){4,}

which will match four or more repetitions of "a slash with some optional non-slashes before and/or after it". Note that this is a partial regex, designed for something like REGEXP_CONTAINS; it doesn't match the whole string. If you want to turn it into something you can pass to REGEXP_MATCH, just put a .* on the front and back of it:

.*(?:[^/]*/[^/]*){4,}.*

See https://github.com/google/re2/wiki/Syntax for what GDS supports in its regexes.

  •  Tags:  
  • Related