Regex to remove duplicate numbers from a string-CodePudding

I have produced a data set with codes separated by pipe symbols. I realized there are many duplicates in each row. Here are three example rows (the regex is applied to each row individually in KNIME)

0612|0613|061|0612|0612
0211|0612|021|0212|0211|0211
0111|0111
0511|0512|0511|0511|0521|0512|0511

I am trying to build a regex that removes the duplicate code numbers from each row. I tested \b(\d )\b.*\b\1\b from a different thread here but the expression does not keep the other codes. The desired outputs for the example rows above would be

0612|0613|061
0211|0612|021|0212
0111
0511|0512|0521|0512

Appreciate your help

CodePudding user response：

No idea what regex engine this knime uses.

Probably you need one that supports variable length lookbehind to do it in one pass, eg. .NET

\|(\d )\b(?<=\b\1\b.*?\1)

See this demo at Regexstorm (check [•] replace matches with, click on "context")

0612|0613|061
0211|0612|021|0212
0111
0511|0512|0521

With a lookahead you can get unique rows too, but vice versa (not like your desired results)

\b(\d )\|(?=.*?\b\1\b)

Another demo on regex101

0613|061|0612
0612|021|0212|0211
0111
0521|0512|0511

CodePudding user response：

Based on the expected output shown, you can use this regex:

(\|\d )\1(?:((?:\|\d )*)\1)?(?=\||$)|^(\d )\|(?=\3\b)

Replacement string is: $2

RegEx Demo