Home > Enterprise >  Regex to search for unique last names in XML
Regex to search for unique last names in XML

Time:05-14

I have last names in an XML file that I would like to capture, which are unique. I am going off this other StackOverflow answer to start: Only match unique string occurrences I am not able to match the strings that I have with this to return one Adams and one Yellow.

\b(.*<LastName>(.*)<\/LastName>)\b(?![\s\S]*\b\1\b)

              <LastName>Adams</LastName>
              <LastName>Adams</LastName>
              <LastName>Yellow</LastName>

https://regex101.com/r/2wLsm5/1

CodePudding user response:

Does this work for you?

/<LastName>(\w )<\/LastName>(?!.*<LastName>\1<\/LastName>)/gsm (note the flags, they're important)

Demo

The issue was that your (.*) to match the name allowed it to match across multiple lines. I replaced it with \w so it only matches word characters (depending on your needs something a little more international might be needed, though).

CodePudding user response:

You can capture the name of the tag and it's content.
Then use the backreferences in the negative lookahead.

A lazy search .*? for the tag's content helps here.

<(LastName)>(.*?)<\/\1>(?![\s\S]*?<\1>\2<\/\1>)

Test on regex101 here

  • Related