Home > Back-end >  How to match URL path using Google RE2 regex
How to match URL path using Google RE2 regex

Time:02-03

Google Cloud Platform lets you create label logs using the RE2 regex engine.

How can I create a regex that matches the path in the URL?

Examples matches:

https://example.com/awesome                  --> "awesome"
https://example.com/awesome/path             --> "awesome/path"
https://example.com/awesome/path/            --> "awesome/path"
https://example.com/awesome/path?arg1=123    --> "awesome/path"

Details:

  • The domain and protocol are constant, it can be assumed to be https://example.com here.
  • If there are multiple directories, they should be matched too, including the / in between.
  • Trailing / should NOT be matched.
  • Queries, e.g. ?arg1=123&arg2=456 should NOT be matched.
  • It can be assumed that directory names will only contain alphanumeric characters a-zA-Z0-9, dashes - and underscores _.

Note that Google RE2 is different than PCRE2.

CodePudding user response:

So the syntax isn't 100% clear what is supported and what isn't. Assuming (NOT SUPPORTED) VIM means it is supported but not on vim, I'd start with a negative look behind for the beginning of the url that you don't care about

(?<=https:\/\/example\.com\/)

Then you want alphanumeric characters [\w\-] followed by non trailing / so I'd add a lookahead to verify that there are alphanumeric characters after the / with (?=\/\w )\/

The complete regex

(?<=https:\/\/example\.com\/)([\w\-] ((?=\/\w )\/|\b)) 
  • Related