What does `/regex/o` really mean (once there was once, but it seems gone now)?-CodePudding

(sorry for the title, but this "feature" really confuses me)

Learning Perl, I learned that the o modifier for a regular expression using variables would be evaluated only once, even if the variable changes after initial evaluation. Initially that reads like having no issues, this being clearly specified.

Obviously that initial evaluation cannot happen before the variable being used has got it value.

Now qr made life a bit more interesting. Consider this code (executed in a loop defining other variables, too):

{
    my $n = $name;
    $n =~ s/[^\w\.-]/_/g;
    $n = qr:^${n}\@${another_variable}$:o;
    @a = grep { !/$n/ } @a;
}

When using the regex for qr directly, one could argue that the regex is compiled only once, even if the scope with the variable goes out of scope (is going out of scope being considered as a change of the variable?)

But when using qr to build a regex, assigning it to a lexical variable, the compiled regex would go out of scope, so I was expecting that the regex cannot be reused and would be re-built (The basic idea was that the regex inside grep shouldn't be rebuilt for every iteration).

As life is cruel, it seems the whole regex referenced by $n isn't ever rebuilt, so the first value is used until the program stops.

Interestingly in Perl 5.18.2 (the version being used) does not mention the o modifier in perlre(1) any more, and perl 5.26.1 says in the corresponding page:

o - pretend to optimize your code, but actually introduce bugs

So can anybody explain the rules for "once" evaluation (and whether the semantics had changed over the lifespan of Perl)?

CodePudding user response：

Perl has a couple of constructs that don't just store state in variables, but also some states in the opcodes themselves. Aside from /o regex patterns, this also includes the .. flip-flop operator (in scalar context) or state variables.

Perhaps state variables are clearest, since it corresponds to static local variables in many other languages (e.g. C). A state variable is initialized at most once during the lifetime of the program. An expression state $var = initialize() can be understood as

my $var;
if (previously_initialized) {
   $var = cached_value;
} else {
   $var = initialize();
}

This does not track dependencies in the initialize() expression, but only evaluates it once.

Similarly, it can make sense to consider a regex pattern /.../o as a kind of hidden state variable state $compiled_pattern = qr/.../.

The /o feature was a good idea a very long time ago when regexes were compiled on the fly, similarly to how it works in other languages where regex patterns are provided to a search function as a string.

It hasn't been necessary for performance purposes since a long time, and only has effects when doing variable interpolation. But if you actually want that behaviour, using a state variable would communicate that intent more clearly. Thus, I'd argue that there is no appropriate use for the /o modifier.

CodePudding user response：

The "largely obsolete /o" (perlop) flag still has the "once" meaning and operation. While it is barely mentioned in perlre and in passing, it is addressed in perlop

/PATTERN/msixpodualngc
...
... Perl will not recompile the pattern unless an interpolated variable that it contains changes. You can force Perl to skip the test and never recompile by adding a /o (which stands for "once") after the trailing delimiter. Once upon a time, Perl would recompile regular expressions unnecessarily, and this modifier was useful to tell it not to do so, in the interests of speed. But now, the only reasons to use /o are one of:

The variables are thousands of characters long and you know that they don't change, and you need to wring out the last little bit of speed by having Perl skip testing for that. (There is a maintenance penalty for doing this, as mentioning /o constitutes a promise that you won't change the variables in the pattern. If you do change them, Perl won't even notice.)

you want the pattern to use the initial values of the variables regardless of whether they change or not. (But there are saner ways of accomplishing this than using /o.)

If the pattern contains embedded code, such as
use re 'eval';  
$code = 'foo(?{ $x })';  
/$code/  
then perl will recompile each time, even though the pattern string hasn't changed, to ensure that the current value of $x is seen each time. Use /o if you want to avoid this.

The bottom line is that using /o is almost never a good idea.

So, indeed, apparently it won't even test whether variables to interpolate changed, and this may have a legitimate use. But, indeed, all told it probably shouldn't be used.

An example to demonstrate the "once" operation

perl -Mstrict -wE'
    sub tt { 
        my ($str, $v) = @_; 
        my $re = qr/$v/o; 
        $str =~ s/$re/X/; 
        return $str 
    };  
    for (qw(a b c)) { say tt( q(a1), $_ ) }'

With the /o, either on the qr-ed pattern or on the regex, this matches that a1 string every time even though the pattern is compiled using a only in the first iteration. Clearly the pattern isn't recompiled since the variable later has b and then c and shouldn't match.

Without /o only the first iteration has the regex matching.

Related: