Is there any elegant solution to build a variable length look-behind regex such as this one ?
/(?<=eat_(apple|pear|orange)_)today|yesterday/g;
It seems Perl has a very impressive regex engine and variable length lookbehind would be very interesting. Is there a way to make it work or should I forget this bad idea ?
CodePudding user response:
Use \K
as a special case.
It's a variable length positive lookbehind assertion:
/eat_(?:apple|pear|orange)_\Ktoday|yesterday/g
Alternatively, you can list out your lookbehind assertions separately:
/(?:(?<=eat_apple_)|(?<=eat_pear_)|(?<=eat_orange_))today|yesterday/g
However, I would propose that it's going to be a rare problem that could potentially use that feature, but couldn't be rethought to use a combination of other more common regex features.
In other words, if you get stuck on a specific problem, feel free to share it here, and I'm sure someone can come up with a different (perhaps better) approach.
CodePudding user response:
How about:
(?:(?<=eat_apple_)|(?<=eat_pear_)|(?<=eat_orange_))(today|yesterday)
A little bit ugly, but it works.
CodePudding user response:
Blog post found today, linked to me at #regex @ irc.freenode.org:
http://www.drregex.com/2019/02/variable-length-lookbehinds-actually.html
This article explains how to do a variable width look-behind in PCRE.
The solution would then be:
/(?=(?=(?'a'[\s\S]*))(?'b'eat_(?:apple|pear|orange)_(?=\k'a'\z)|(?<=(?=x^|(?&b))[\s\S])))today|yesterday/g
https://regex101.com/r/9DNpFj/1
CodePudding user response:
You can use look-ahead instead of look-behind:
/(?:eat_(apple|pear|orange)_)(?=today|yesterday)/g
and in general, there is an alternative way to describe things that naively seem to require look-behind.
CodePudding user response:
Perl v5.30 adds experimental variable-width lookbehinds in situations where the regex engine knows that the length will be 255 characters or less (so, no unbounded quantifiers, for example).
This now works:
use v5.30;
use experimental qw(vlb);
$_ = 'eat_apple_today';
say "Matched!" if /(?<=eat_(apple|pear|orange)_)today|yesterday/g;
CodePudding user response:
Alternative solution - reverse the string and use lookahead instead. It may look ugly having to write the pattern words in reverse but it's an option when everything else fails.
CodePudding user response:
The solution that worked for me:
Temporarily make whatever is variable in length fixed in length.
In this case:
Change all your 'eat_apple's, 'eat_pear's and 'eat_orange's to something like eat_fruit, and then run the expression you were thinking of with an acceptable fixed length look-behind. Even though it takes two passes and some memory, I find the code way easier to read, and it might be faster than some of these other solutions.