Home > Blockchain >  Perl regex capture each character as one group
Perl regex capture each character as one group

Time:09-20

I have:

Aenean placerat >> /example/alpha.txt
est et rutrum ultrices >> /example/beta.txt
dolor nibh ultricies nulla >> /example/gamma delta.txt

I need:

Aenean placerat >> \/\e\x\a\m\p\l\e\/\a\l\p\h\a\.\t\x\t
est et rutrum ultrices >> \/\e\x\a\m\p\l\e\/\b\e\t\a\.\t\x\t
dolor nibh ultricies nulla >> \/\e\x\a\m\p\l\e\/\g\a\m\m\a\ \d\e\l\t\a\.\t\x\t

Obviously this does not work, but I can't find a way to achieve this...

's/(.*) >> (.)*/$1 >> \\$2/gm'

CodePudding user response:

A quick and simple way is to apply a regex substitution inside your regex substitution.

use strict;
use warnings;

while (<DATA>) {
    s/>> \K(. )/ $1 =~ s#(.)#\\$1#gr /e;
    #            ^^^^^^^^^^^^^^^^^^^ inner substitution
    print;
}

__DATA__
Aenean placerat >> /example/alpha.txt
est et rutrum ultrices >> /example/beta.txt
dolor nibh ultricies nulla >> /example/gamma delta.txt

The /e (eval) modifier tells Perl to evaluate the RHS as code. Note the use of alternative delimiters on the inner substitution operator s###, and the use of the /r modifier to return the value only (we can't modify a read-only variable anyway). The \K escape allows us to "keep" what is left of the regex match.

This can be used as a simple one-liner:

perl -pe's/>> \K(. )/ $1 =~ s#(.)#\\$1#gr /e' yourfile.txt

CodePudding user response:

One approach is to split the line in two, then apply the substitution only to the right side:

use warnings;
use strict;

while (<DATA>) {
    my ($x, $y) = split /\s >>\s /;
    $y =~ s/(.)/\\$1/g;
    print "$x >> $y";
}

__DATA__
Aenean placerat >> /example/alpha.txt
est et rutrum ultrices >> /example/beta.txt
dolor nibh ultricies nulla >> /example/gamma delta.txt

Outputs:

Aenean placerat >> \/\e\x\a\m\p\l\e\/\a\l\p\h\a\.\t\x\t
est et rutrum ultrices >> \/\e\x\a\m\p\l\e\/\b\e\t\a\.\t\x\t
dolor nibh ultricies nulla >> \/\e\x\a\m\p\l\e\/\g\a\m\m\a\ \d\e\l\t\a\.\t\x\t

CodePudding user response:

You can match every single character after the first occurrence of >> and then use \K to clear what is matched so far in combination with \G to match every single character after it.

(?:^.*?>>\h*|\G(?!^))\K.

Explanation

  • (?: Non capture group for the alternation
    • ^.*?>>\h* Match until the first occurrence of >> followed by optional horizontal whitespace chars
    • | Or
    • \G(?!^) Assert the position at the end of the previous match, not at the start of the string
  • ) Close the non capture group
  • \K Forget what is matched so far
  • . Match a single character

See a regex demo or a perl demo.

In the replacement use the full match preceded by \

Example

use strict;
use warnings;

while (<DATA>) {
    s/(?:^.*?>>\h*|\G(?!^))\K./\\$&/g;
    print;
}

__DATA__
Aenean placerat >> /example/alpha.txt
est et rutrum ultrices >> /example/beta.txt
dolor nibh ultricies nulla >> /example/gamma delta.txt

Output

Aenean placerat >> \/\e\x\a\m\p\l\e\/\a\l\p\h\a\.\t\x\t
est et rutrum ultrices >> \/\e\x\a\m\p\l\e\/\b\e\t\a\.\t\x\t
dolor nibh ultricies nulla >> \/\e\x\a\m\p\l\e\/\g\a\m\m\a\ \d\e\l\t\a\.\t\x\t
  • Related