We have a requirement where we want to convert Regex to cloudfront supported Glob and vice-versa. Any suggestion how can we achieve that and first of all whether it's possible?especially from Regex to Glob, as I understand regex is kind of superset so it might not be possible to convert all the Regex to corresponding Glob?
CodePudding user response:
To convert from a glob you would need to write a parser that split the pattern into an abstract syntax tree. For example, the glob *-{[0-9],draft}.docx
might parse to [Anything(), "-", OneOf([Range("0", "9"), "draft"]), ".docx"]
.
Then you would walk the AST and output the equivalent regular expression for each node. For example, the rules you might use for this could be:
Anything() -> .*
Range(x, y) -> [x-y]
OneOf(x, y) -> (x|y)
resulting in the regular expression .*-([0-9]|draft).docx
.
That's not perfect, because you also have to remember to escape any special characters; .
is a special character in regular expressions, so you should escape it, yielding finally .*-([0-9]|draft)\.docx
.
Strictly speaking regular expression cannot all be translated to glob patterns. The Kleene star operation does not exist in globbing; the simple regular expression a*
(i.e., any number of a
characters) cannot be translated to a glob pattern.
I'm not sure what types of globs CloudFront supports (the documentation returned no hits for the term "glob"), but here is some documentation on commonly-supported shell glob pattern wildcards.
Here is a summarization of the some equivalent sequences:
Glob Wildcard | Regular Expression | Meaning |
---|---|---|
? |
. |
Any single character |
* |
.* |
Zero or more characters |
[a-z] |
[a-z] |
Any character from the range |
[!a-m] |
[^a-m] |
A character not in the range |
[a,b,c] |
[abc] |
One of the given characters |
{cat,dog,bat} |
(cat|dog|bat) |
One of the given options |
{*.tar,*.gz} |
(.*\.tar|.*\.gz) |
One of the given options, considering nested wildcards |