Home > Blockchain >  Regex to Glob and vice-versa conversion
Regex to Glob and vice-versa conversion

Time:12-05

We have a requirement where we want to convert Regex to cloudfront supported Glob and vice-versa. Any suggestion how can we achieve that and first of all whether it's possible?especially from Regex to Glob, as I understand regex is kind of superset so it might not be possible to convert all the Regex to corresponding Glob?

CodePudding user response:

To convert from a glob you would need to write a parser that split the pattern into an abstract syntax tree. For example, the glob *-{[0-9],draft}.docx might parse to [Anything(), "-", OneOf([Range("0", "9"), "draft"]), ".docx"].

Then you would walk the AST and output the equivalent regular expression for each node. For example, the rules you might use for this could be:

Anything()  -> .*
Range(x, y) -> [x-y]
OneOf(x, y) -> (x|y)

resulting in the regular expression .*-([0-9]|draft).docx.

That's not perfect, because you also have to remember to escape any special characters; . is a special character in regular expressions, so you should escape it, yielding finally .*-([0-9]|draft)\.docx.

Strictly speaking regular expression cannot all be translated to glob patterns. The Kleene star operation does not exist in globbing; the simple regular expression a* (i.e., any number of a characters) cannot be translated to a glob pattern.

I'm not sure what types of globs CloudFront supports (the documentation returned no hits for the term "glob"), but here is some documentation on commonly-supported shell glob pattern wildcards.

Here is a summarization of the some equivalent sequences:

Glob Wildcard Regular Expression Meaning
? . Any single character
* .* Zero or more characters
[a-z] [a-z] Any character from the range
[!a-m] [^a-m] A character not in the range
[a,b,c] [abc] One of the given characters
{cat,dog,bat} (cat|dog|bat) One of the given options
{*.tar,*.gz} (.*\.tar|.*\.gz) One of the given options, considering nested wildcards
  • Related