Home > front end >  regex which will identify text containing a | but only if it is not within ' anywhere else in t
regex which will identify text containing a | but only if it is not within ' anywhere else in t

Time:01-14

I am trying to devise a Perl regex as an if condition that identifies | but will fail if the | is contained anywhere in a quoted string such as 'It went >| CRASH |< as it fell on the floor'

Example inputs are below

This should match:

action 71|55|279|286|155|57|343

This should fail to match:

action mud_destroyset($me,$arg,$arg1,$arg2,'gun','2','There is an almighty >| CRASH |< . When the smoke clears, both door and sphere are gone...','You hear the >| CRASH |< of a cannon going off in the distance.','',0,$cid,$oc) ;

I have tried negative lookbehind regex as follows and fiddled extensively and failed. I even asked ChatGPT and it failed.

These didn't work (the 1st one is ChatGPT solution, the 2nd is my attempt):

^(?:(?!'\|).)*\|
(?<!'). \|. 

https://regex101.com/r/1o0SOM/1

https://regex101.com/r/z5Xz83/1

Help appreciated!

CodePudding user response:

One way could be removing anything between single quotes and then looking for the pipe:

index($txt =~ s/'[^']*'//gr, "|") != -1

An example run:

use strict;
use warnings;

my @texts = ("action 71|55|279|286|155|57|343",
             "action 'There is this >| CRASH |< .'");

for my $txt (@texts) {
    print index($txt =~ s/'[^']*'//gr, "|") != -1 ? "yes\n" : "no\n";
}

which gives

yes
no

  • ': literal single quote
  • [^']*: anything but single quote, repetad as many times as possible
  • ': again a literal single quote
  • "g" flag: global replacement
  • "r" flag: nondestructive, i.e., return a new string

and index function looks for the substring ("|") and returns -1 if it cannot find it in the searched string.

CodePudding user response:

After some experimentation I found a solution that worked for me, although it is not perfect as it will not produce a match if there is any string literal before the pipe, but works for all my input data cases.

^(?!.*\').*\|

https://regex101.com/r/QZKMKS/1

A more sophisticated solution was also suggested on reddit:

'[^'] '(*SKIP)(*F)|\|

https://regex101.com/r/z5Xz83/2

  • Related