I have an randomly ordered array of regular expressions like this:
let patterns = [
/foo ba r/,
/foo/,
/foo bar/,
/foobar/,
/m[eo]{4,}w/,
/boo/,
/fooo*/,
/meow/
]
I'm not sure if this is possible but I would like to write an algorithm which sorts the regular expressions from least greedy to most greedy, like this:
[
/foo/,
/boo/,
/fooo*/,
/meow/,
/foobar/,
/foo bar/,
/m[eo]{4,}w/,
/foo ba r/
]
I would imagine such sorting could be achieved like so:
patterns.sort((p1, p2) { return p1.greediness() - p2.greediness() });
But there exists no method called greediness
in the the RegExpr
class.
Ideally, the greediness
method would return the number of characters which could be possibly matched at minimum. i.e:
/foo/.greediness() == 3
/boo/.greediness() == 3
/fooo*/.greediness() == 3
/meow/.greediness() == 4
/foobar/.greediness() == 6
/foo bar/.greediness() == 6
/m[eo]{4,}w/.greediness() == 6
/foo ba r/.greediness() == 6
What would your solution be to this problem?
CodePudding user response:
Here is a concept of what you might use as an idea.
You can use prototype to define a custom function for the RegEx:
// Extend Regular Expression with a custom method
RegExp.prototype.greediness = function () {
return (this.source.match(/[* {]/g) || []).length
}
And then you perform a sorting on that patterns:
// Sort patterns by greediness
const patternByGreediness = patterns.sort((a, b) => {
return b.greediness() - a.greediness()
})
Check full code snippet
let patterns = [
/foo ba r/,
/foo/,
/foo bar/,
/foobar/,
/m[eo]{4,}w/,
/boo/,
/fooo*/,
/meow/
]
// Sort
patterns.sort((a, b) => {
return b.source.length - a.source.length
}).reverse()
// Reverse order as you want
console.log(patterns)
// Extend Regular Expression with a custom method
RegExp.prototype.greediness = function () {
return (this.source.match(/[* {]/g) || []).length
}
// Sort patterns by greediness
const patternByGreediness = patterns.sort((a, b) => {
return b.greediness() - a.greediness()
})
console.log(patternByGreediness)
// Get a number of how greedy a pattern is
console.log(/m[eo]{4,}w/.greediness())
Hope that helps.
CodePudding user response:
As Pointy said in the comments, this is a hard problem.
Here is the beginning of a solution:
const greediness = (s) =>
s .toString () .slice (1, -1)
.replace (/\[[^\]] ]/g, 'X')
.replace (/.\{((\d )(,\d*))\}/g, (s, a, ds, _) => 'X' .repeat (Number (ds)))
.replace (/.\ /g, 'X')
.replace (/.\?/g, '')
.replace (/.\*/g, '')
.length
const sortByGreediness = (patterns) =>
[...patterns] .sort ((a, b) => greediness (a) - greediness (b))
// .map (s => [s, greediness (s)]) // to show sizes
const patterns = [/foo ba r/, /foo/, /foo bar/, /foobar/, /m[eo]{4,}w/, /boo/, /fooo*/, /meow/]
console .log (sortByGreediness (patterns))
We simply take the text of the regex and replace quantifiers and their preceding characters with the smallest number of characters that might match. We do something similar for blocks like [eo]
and X{4,}
.
This we might go through steps like this:
m[eo]{4,}wp u r?r*
mX{4,}wp u r?r*
mXXXXwp u r?r*
mXXXXXwXr?r*
mXXXXXwXr*
mXXXXXwX
- length 7
But this doesn't touch on the complexities that can be inside a regex, and doesn't even try to handle capturing groups. I think it would next to impossible to do for the full regex spec, but perhaps this can be expanded toward what you need.
(If you're getting more complex, you might want to do this repeatedly with code something like the following, or with a while
loop in place of its recursion.
const greediness = (s, next = s .toString () .slice (1, -1), prev = null) =>
next === prev
? next .length
: greediness (s, next
.replace (/\[[^\]] ]/g, 'X')
.replace (/.\{((\d )(,\d*))\}/g, (s, a, ds, _) => 'X' .repeat (Number (ds)))
.replace (/.\ /g, 'X')
.replace (/.\?/g, '')
.replace (/.\*/g, '')
, next)