I'm using Perl to highlight errors through my browser as I scan through pages of text. At this point, I want to ensure the text Seq
is preceded by a maltese cross and space ✠
, otherwise highlight it. I also want to ignore n>Seq
.
PS. If it's easier, I want to ignore >
but it will always be n>
. In fact, it would always be </span>
- whichever is easiest to check for.
Example phrase: ✠ Seq. S. Evangélii sec. Joánnem. — In illo témpore
I'm trying to replace xySeq
if xy
is NOT a Maltese cross and a space ✠
, AND if xy
is NOT the letter n
and a greater than symbol n>
.
In other words, I don’t want to substitute
✠ Seq
n>Seq
>Seq
</span>Seq
but I do want to replace things like
✠Seq
* Seq
a✠Seq
>aSeq
The following would work if I was just checking for single characters like ✠
or >
my $span_beg = q(<span class='bcy'>); # HTML markup for highlighting
my $span_end = q(</span>);
$phr =~ s/([^✠>]Seq)/$span_beg$1$span_end/g;
but [^✠ >]Seq
will naturally only treat the ✠ and the space as one or the other.
I even tried [^(✠\s)>]Seq
and a varible [^$var>] but these didn’t work.
I played with (?<!✠\s)Seq
but didn't know how to incorporate >
or if it was even the right way to go.
I hope this is possible, thanks for all.
Guy
CodePudding user response:
If you always want to tag Seq
and exactly two characters before it, a couple of look-behinds might be enough:
s{..(?<!✠\s)(?<!n>)Seq}{$span_beg$&$span_end}g;
Or, with look-ahead:
s{(?!✠\s)(?!n>)..Seq}{$span_beg$&$span_end}g;
CodePudding user response:
This should be more efficient than performing lookaround at every position:
# Doesn't include preceding characters in the span.
s{(✠ |>)?Seq}{ $1 ? $& : "$span_beg$&$span_end" }eg
# Includes two preceding characters in the span.
s{(?:(✠ |>)|..)Seq}{ $1 ? $& : "$span_beg$&$span_end" }seg