I'm trying to regex match any duplicate words (i.e. alphanumeric and can have dashes) in some yaml with a PCRE tool
I have found [1] a consecutive, duplicate regex matcher:
(?<=,|^)([^,]*)(,\1) (?=,|$)
it will catch
hello-world,hello-world,goodbye-world,goodbye-world
but not the "hello-world"s in
hello-world,goodbye-world,goodbye-world,hello-world
Could someone help me try to build a regex pattern for the second case (or both cases)?
[1] - https://www.regular-expressions.info/duplicatelines.html
CodePudding user response:
Put an optional ,.*
between the capture group and the back-reference.
(?<=,|^)([^,]*)(?:,.*)?(,\1)(?=,|$)
CodePudding user response:
You may use this regex:
(?<=^|,)([^,] )(?=(?>,[^,]*)*,\1(?>,|$))(?=,|$)
RegEx Details:
(?<=^|,)
: Assert that we have,
or start position before current position([^,] )
: Match 1 of non-comma text and capture in group #1(?=(?>,[^,]*)*,\1(?>,|$))
: Lookahead to assert presence of same value we captured in group #1 ahead of us(?=,|$)
: Assert that we have,
or end position ahead