Home > Software design >  Variable length look-behind
Variable length look-behind

Time:11-15

Is there any elegant solution to build a variable length look-behind regex such as this one ?

/(?<=eat_(apple|pear|orange)_)today|yesterday/g;

It seems Perl has a very impressive regex engine and variable length lookbehind would be very interesting. Is there a way to make it work or should I forget this bad idea ?

CodePudding user response:

Use \K as a special case.

It's a variable length positive lookbehind assertion:

/eat_(?:apple|pear|orange)_\Ktoday|yesterday/g

Alternatively, you can list out your lookbehind assertions separately:

/(?:(?<=eat_apple_)|(?<=eat_pear_)|(?<=eat_orange_))today|yesterday/g

However, I would propose that it's going to be a rare problem that could potentially use that feature, but couldn't be rethought to use a combination of other more common regex features.

In other words, if you get stuck on a specific problem, feel free to share it here, and I'm sure someone can come up with a different (perhaps better) approach.

CodePudding user response:

How about:

(?:(?<=eat_apple_)|(?<=eat_pear_)|(?<=eat_orange_))(today|yesterday)

A little bit ugly, but it works.

CodePudding user response:

Blog post found today, linked to me at #regex @ irc.freenode.org:

http://www.drregex.com/2019/02/variable-length-lookbehinds-actually.html

This article explains how to do a variable width look-behind in PCRE.

The solution would then be:

/(?=(?=(?'a'[\s\S]*))(?'b'eat_(?:apple|pear|orange)_(?=\k'a'\z)|(?<=(?=x^|(?&b))[\s\S])))today|yesterday/g

https://regex101.com/r/9DNpFj/1

CodePudding user response:

You can use look-ahead instead of look-behind:

/(?:eat_(apple|pear|orange)_)(?=today|yesterday)/g

and in general, there is an alternative way to describe things that naively seem to require look-behind.

CodePudding user response:

Perl v5.30 adds experimental variable-width lookbehinds in situations where the regex engine knows that the length will be 255 characters or less (so, no unbounded quantifiers, for example).

This now works:

use v5.30;
use experimental qw(vlb);

$_ = 'eat_apple_today';
say "Matched!" if /(?<=eat_(apple|pear|orange)_)today|yesterday/g;

CodePudding user response:

Alternative solution - reverse the string and use lookahead instead. It may look ugly having to write the pattern words in reverse but it's an option when everything else fails.

CodePudding user response:

The solution that worked for me:
Temporarily make whatever is variable in length fixed in length.

In this case:
Change all your 'eat_apple's, 'eat_pear's and 'eat_orange's to something like eat_fruit, and then run the expression you were thinking of with an acceptable fixed length look-behind. Even though it takes two passes and some memory, I find the code way easier to read, and it might be faster than some of these other solutions.

  • Related