Home > Enterprise >  Perl regex capture groups and nth occurence
Perl regex capture groups and nth occurence

Time:08-05

I am learning perl regex, and try to combine capture groups and specifying nth occurence of a string.

Say I have the following:

title="alpha" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma

I want to change the title attribute to the string that follows nth name=, e.g. sigma, while keeping all the content in between. Also, name= may have double quotes such as name="beta" or name=sigma.

1st occurence of name=:

title="beta" lorem ipsum lorem ipsum Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma

2nd occurence of name=:

title="sigma" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur

I use:

find . -type f -exec perl -pi -w -e 's/(title=)"?[^"\s]*"?(.*) name="?([^"\/] )"?/$1"$3"$2/' \{\} \;

This works for the first occurence of name=.

I cannot figure how to modify this to specify the nth occurence of name=. I know the basics of specifying nth occurence (such as replace second abc by xyz), ...

s/abc/   $count == 2 ? "xyz" : "abc" /eg

... but have trouble integrating this to my code above. How to specify nth name= and move its following capture group in place of title attribute?

CodePudding user response:

You may use this perl solution:

# 3rd occurrence 
perl -pe 's/(title=)"?[^"\s]*"?((?:.*?\h name=){3}"?([^"\s] )"?)/$1"$3"$2/' 

title="sigma" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma

# 2nd occurrence
perl -pe 's/(title=)"?[^"\s]*"?((?:.*?\h name=){2}"?([^"\s] )"?)/$1"$3"$2/'

title="delta" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma

# 1st occurrence
perl -pe 's/(title=)"?[^"\s]*"?((?:.*?\h name=){1}"?([^"\s] )"?)/$1"$3"$2/'

title="beta" lorem ipsum lorem ipsum name="beta" Morbi posuere metus purus name=delta Curabitur ullamcorper finibus consectetur name=sigma

Here (?:.*?\h name=){N} match N occurrences of sub-pattern that is any text followed by 1 whitespaces followed by name

CodePudding user response:

You can use a pattern to set a manual quantifier in the {n} part and optionally repeat key=value pairs to get to the one you are interested in.

(title=)"?[^\s="] "?( (?:.*?[^\s= ] =[^\s= ] ){0}.*?)[^\s= ] ="?([^\s= "] )"?\h*
                                              ^^^

See for example a regex demo for zero repetitions and a regex demo for 1 repetition.

  • Related