I stored a RegExp object in a variable and used it for mapping an array of strings into an array of objects (parsed e-mail recipients), but it doesn't work, as if a RegExp object couldn't run its .exec()
method more than once.
However, if I use a regular expression literal instead of the stored object, it works as intended.
I cannot understand the reason behind this behavior. Is it expected, or could it be a bug?
The code:
const pattern = /^\s*(?<name>\w.*?)?\W (?<address>[a-zA-Z\d._-] @[a-zA-Z\d._-] \.[a-zA-Z\d_-] )\W*$/gi;
const input = "John Doe [email protected]; Ronald Roe <[email protected]>";
const splitValues = input.split(/[\r\n,;] /).map(s => s.trim()).filter(s => !!s);
const matchGroups1 = splitValues.map(s => pattern.exec(s));
console.log('Using pattern RegExp object:', JSON.stringify(matchGroups1, null, 2));
const matchGroups2 = splitValues.map(s => /^\s*(?<name>\w.*?)?\W (?<address>[a-zA-Z\d._-] @[a-zA-Z\d._-] \.[a-zA-Z\d_-] )\W*$/gi.exec(s));
console.log('Using literal regular expression:', JSON.stringify(matchGroups2, null, 2));
The output:
[LOG]: "Using pattern RegExp object:", "[
[
"John Doe [email protected]",
"John Doe",
"[email protected]"
],
null
]"
[LOG]: "Using literal regular expression:", "[
[
"John Doe [email protected]",
"John Doe",
"[email protected]"
],
[
"Ronald Roe <[email protected]>",
"Ronald Roe",
"[email protected]"
]
]"
test in TypeScript playground
CodePudding user response:
The difference lies in the /g
flag that you've passed to both regexes. From MDN:
RegExp.prototype.exec()
method with theg
flag returns each match and its position iteratively.const str = 'fee fi fo fum'; const re = /\w \s/g; console.log(re.exec(str)); // ["fee ", index: 0, input: "fee fi fo fum"] console.log(re.exec(str)); // ["fi ", index: 4, input: "fee fi fo fum"] console.log(re.exec(str)); // ["fo ", index: 7, input: "fee fi fo fum"] console.log(re.exec(str)); // null
So /g
on a regex turns the regex object itself into a funny sort of mutable state-tracker. When you call exec
on a /g
regex, you're matching and also setting a parameter on that regex which remembers where it left off for next time. The intention is that if you match against the same string, you won't get the same match twice, allowing you to do mutable tricks with while
loops similar to the sort of way you would write a global regex match in Perl.
But since you're matching on two different strings, it causes problems. Let's look at a simplified example.
const re = /a/g;
re.exec("ab"); // Fine, we match against "a"
re.exec("ba"); // We start looking at the second character, so we match the "a" there.
re.exec("ab"); // We start looking at the third character, so we get *no* match.
Whereas in the case where you produce the regex every time, you never see this statefulness, since the regex object is made anew each time.
So the summary is: Don't use /g
if you're planning to reuse the regex against multiple strings.
CodePudding user response:
See Why does Javascript's regex.exec() not always return the same value?. The issue is that exec
is stateful: in other words it starts the next search after the index of the last one. You can avoid the issue by including pattern.lastIndex = 0;
in the map
: or else by using a literal as you suggest.
const pattern = /^\s*(?<name>\w.*?)?\W (?<address>[a-zA-Z\d._-] @[a-zA-Z\d._-] \.[a-zA-Z\d_-] )\W*$/gi;
const input = "John Doe [email protected]; Ronald Roe <[email protected]>";
const splitValues = input.split(/[\r\n,;] /).map(s => s.trim()).filter(s => !!s);
const matchGroups1 = splitValues.map(s => {pattern.lastIndex = 0; return pattern.exec(s)});
console.log('Using pattern RegExp object:', JSON.stringify(matchGroups1, null, 2));
const matchGroups2 = splitValues.map(s => /^\s*(?<name>\w.*?)?\W (?<address>[a-zA-Z\d._-] @[a-zA-Z\d._-] \.[a-zA-Z\d_-] )\W*$/gi.exec(s));
console.log('Using literal regular expression:', JSON.stringify(matchGroups2, null, 2));