I use perl regex capture groups to replace the pattern of a large number of files.
File example 1:
title="alpha" lorem ipsum lorem ipsum name="beta"
File example 2:
title="omega" Morbi posuere metus purus name="delta"
for
title="beta" lorem ipsum lorem ipsum
title="delta" Morbi posuere metus purus
using
find . -type f -exec perl -pi -w -e 's/title="(?'one'.*?)"(?'three'.*?)name="(?'two'.*?)"/title="\g{two}"\g{three}/g;' \{\} \;
(Note that (1) attribute values of title and name are unknown variables and (2) the content between title="alpha"
and name="beta"
differs. )
I am still learning perl regex. What am I doing wrong? .
CodePudding user response:
This perl
command line should work:
perl -pe 's/(title=)"[^"] "(.*) name=("[^"] ")/$1$3$2/' file
title="beta" lorem ipsum lorem ipsum
title="delta" Morbi posuere metus purus
Explanation:
(title=)
: Matchtitle=
and capture in group #1"[^"] "
: Match a quoted string(.*)
: Match 0 or more of any chars and capture in group #2name=
: Matchname=
text("[^"] ")
: Match a quoted string and capture in group #3$1$3$2
: Replacement part
CodePudding user response:
1st solution: Since you are using find
command of shell, so in case you are ok with awk
code, here it goes, written and tested in GNU awk
.
Here is the Online demo for used regex in following code.
awk -v s1="\"" '
match($0,/(title=)"[^"]*" (.*)name="([^"]*)"/,arr){
print arr[1] s1 arr[3] s1,arr[2]
}
' Input_file
Explanation: Simple explanation here would be using GNU awk
's match
function; which allows us to use regex in it to find the required output. In here I am using regex(title=)"[^"]*" (.*)name="([^"]*)"
which is creating 3 capturing groups, whose values are getting stored into array named arr with index of ``1,2,3 with values of captured groups values. Then while printing the values I am printing them as per required output by OP.
2nd solution: In sed
with same regex and -E
(ERE) enabled option please try following code.
sed -E 's/^(title=)"[^"]*" (.*)name="([^"]*)"/\1"\3" \2/' Input_file
CodePudding user response:
A bit of syntax: capture with (?<name>pattern)
use with $ {name}
(where delimiters may be varied, see it in perlre). The regex is
s{ title="(?<t>[^"] )" (?<text>.*?) name="(?<n>[^"] )" }
{title=$ {n}$ {text}}x
A full example, with the above regex copy-pasted, to run on the command line
echo title=\"alpha\" lorem ipsum lorem ipsum name=\"beta\" | perl -wpe's{ title="(?<t>[^"] )" (?<text>.*?) name="(?<n>[^"] )" }{title=$ {n}$ {text}}x'
prints
title=beta lorem ipsum lorem ipsum
Not sure what the first one need be captured for, as in the question, but perhaps there is more to it than shown so it is captured here as well, into $ {t}
. Also see %
, where these captures are avaiable, in perlvar.
Also, the question uses those quotes rather loosely. One can string together '
-delimited strings for one command-line program but I'd suggest not to (if that was the intent).