Home > other >  Perl regex capture groups and reshuffle pattern
Perl regex capture groups and reshuffle pattern

Time:07-30

I use perl regex capture groups to replace the pattern of a large number of files.

File example 1:

title="alpha" lorem ipsum lorem ipsum name="beta"

File example 2:

title="omega" Morbi posuere metus purus name="delta"

for

title="beta" lorem ipsum lorem ipsum
title="delta" Morbi posuere metus purus

using

find . -type f -exec perl -pi -w -e 's/title="(?'one'.*?)"(?'three'.*?)name="(?'two'.*?)"/title="\g{two}"\g{three}/g;' \{\} \;

(Note that (1) attribute values of title and name are unknown variables and (2) the content between title="alpha" and name="beta" differs. )

I am still learning perl regex. What am I doing wrong? .

CodePudding user response:

This perl command line should work:

perl -pe 's/(title=)"[^"] "(.*) name=("[^"] ")/$1$3$2/' file

title="beta" lorem ipsum lorem ipsum
title="delta" Morbi posuere metus purus

Explanation:

  • (title=): Match title= and capture in group #1
  • "[^"] ": Match a quoted string
  • (.*): Match 0 or more of any chars and capture in group #2
  • name=: Match name= text
  • ("[^"] "): Match a quoted string and capture in group #3
  • $1$3$2: Replacement part

CodePudding user response:

1st solution: Since you are using find command of shell, so in case you are ok with awk code, here it goes, written and tested in GNU awk.

Here is the Online demo for used regex in following code.

awk -v s1="\"" '
match($0,/(title=)"[^"]*" (.*)name="([^"]*)"/,arr){
  print arr[1] s1 arr[3] s1,arr[2]
}
'  Input_file

Explanation: Simple explanation here would be using GNU awk's match function; which allows us to use regex in it to find the required output. In here I am using regex(title=)"[^"]*" (.*)name="([^"]*)" which is creating 3 capturing groups, whose values are getting stored into array named arr with index of ``1,2,3 with values of captured groups values. Then while printing the values I am printing them as per required output by OP.



2nd solution: In sed with same regex and -E(ERE) enabled option please try following code.

sed -E 's/^(title=)"[^"]*" (.*)name="([^"]*)"/\1"\3" \2/' Input_file

CodePudding user response:

A bit of syntax: capture with (?<name>pattern) use with $ {name} (where delimiters may be varied, see it in perlre). The regex is

s{ title="(?<t>[^"] )" (?<text>.*?) name="(?<n>[^"] )" }
 {title=$ {n}$ {text}}x

A full example, with the above regex copy-pasted, to run on the command line

echo title=\"alpha\" lorem ipsum lorem ipsum name=\"beta\" | perl -wpe's{ title="(?<t>[^"] )" (?<text>.*?) name="(?<n>[^"] )" }{title=$ {n}$ {text}}x'

prints

title=beta lorem ipsum lorem ipsum 

Not sure what the first one need be captured for, as in the question, but perhaps there is more to it than shown so it is captured here as well, into $ {t}. Also see % , where these captures are avaiable, in perlvar.

Also, the question uses those quotes rather loosely. One can string together '-delimited strings for one command-line program but I'd suggest not to (if that was the intent).

  • Related