I am learning perl regex, and try to combine capture groups and specifying nth occurence of a string.
Say I have the following:
title="alpha" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma
I want to change the title
attribute to the string that follows nth name=
, e.g. sigma
, while keeping all the content in between. Also, name=
may have double quotes such as name="beta"
or name=sigma
.
1st occurence of name=
:
title="beta" lorem ipsum lorem ipsum Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma
2nd occurence of name=
:
title="sigma" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur
I use:
find . -type f -exec perl -pi -w -e 's/(title=)"?[^"\s]*"?(.*) name="?([^"\/] )"?/$1"$3"$2/' \{\} \;
This works for the first occurence of name=
.
I cannot figure how to modify this to specify the nth occurence of name=
.
I know the basics of specifying nth occurence (such as replace second abc
by xyz
), ...
s/abc/ $count == 2 ? "xyz" : "abc" /eg
... but have trouble integrating this to my code above. How to specify nth name=
and move its following capture group in place of title
attribute?
CodePudding user response:
You may use this perl
solution:
# 3rd occurrence
perl -pe 's/(title=)"?[^"\s]*"?((?:.*?\h name=){3}"?([^"\s] )"?)/$1"$3"$2/'
title="sigma" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma
# 2nd occurrence
perl -pe 's/(title=)"?[^"\s]*"?((?:.*?\h name=){2}"?([^"\s] )"?)/$1"$3"$2/'
title="delta" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma
# 1st occurrence
perl -pe 's/(title=)"?[^"\s]*"?((?:.*?\h name=){1}"?([^"\s] )"?)/$1"$3"$2/'
title="beta" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma
Here (?:.*?\h name=){N}
match N
occurrences of sub-pattern that is any text followed by 1 whitespaces followed by name
CodePudding user response:
You can use a pattern to set a manual quantifier in the {n}
part and optionally repeat key=value pairs to get to the one you are interested in.
(title=)"?[^\s="] "?( (?:.*?[^\s= ] =[^\s= ] ){0}.*?)[^\s= ] ="?([^\s= "] )"?\h*
^^^
See for example a regex demo for zero repetitions and a regex demo for 1 repetition.