Google Cloud Platform lets you create label logs using the RE2 regex engine.
How can I create a regex that matches the path in the URL?
Examples matches:
https://example.com/awesome --> "awesome"
https://example.com/awesome/path --> "awesome/path"
https://example.com/awesome/path/ --> "awesome/path"
https://example.com/awesome/path?arg1=123 --> "awesome/path"
Details:
- The domain and protocol are constant, it can be assumed to be
https://example.com
here. - If there are multiple directories, they should be matched too, including the
/
in between. - Trailing
/
should NOT be matched. - Queries, e.g.
?arg1=123&arg2=456
should NOT be matched. - It can be assumed that directory names will only contain alphanumeric characters
a-zA-Z0-9
, dashes-
and underscores_
.
Note that Google RE2 is different than PCRE2.
CodePudding user response:
So the syntax isn't 100% clear what is supported and what isn't. Assuming (NOT SUPPORTED) VIM means it is supported but not on vim, I'd start with a negative look behind for the beginning of the url that you don't care about
(?<=https:\/\/example\.com\/)
Then you want alphanumeric characters [\w\-]
followed by non trailing /
so I'd add a lookahead to verify that there are alphanumeric characters after the /
with (?=\/\w )\/
The complete regex
(?<=https:\/\/example\.com\/)([\w\-] ((?=\/\w )\/|\b))