I have last names in an XML file that I would like to capture, which are unique. I am going off this other StackOverflow answer to start: Only match unique string occurrences I am not able to match the strings that I have with this to return one Adams and one Yellow.
\b(.*<LastName>(.*)<\/LastName>)\b(?![\s\S]*\b\1\b)
<LastName>Adams</LastName>
<LastName>Adams</LastName>
<LastName>Yellow</LastName>
https://regex101.com/r/2wLsm5/1
CodePudding user response:
Does this work for you?
/<LastName>(\w )<\/LastName>(?!.*<LastName>\1<\/LastName>)/gsm
(note the flags, they're important)
The issue was that your (.*)
to match the name allowed it to match across multiple lines. I replaced it with \w
so it only matches word characters (depending on your needs something a little more international might be needed, though).
CodePudding user response:
You can capture the name of the tag and it's content.
Then use the backreferences in the negative lookahead.
A lazy search .*?
for the tag's content helps here.
<(LastName)>(.*?)<\/\1>(?![\s\S]*?<\1>\2<\/\1>)
Test on regex101 here