Home > Software engineering >  Capture a word ` ` same word again but with a prefix
Capture a word ` ` same word again but with a prefix

Time:03-01

To all the Regex gurus

Any idea how to handle this beast

string = 'Position_Name [ |-|/|*] PrevYear Position_Name'

Looking for the Regex to match the occurrences of Position_Name (basically twice similar to a duplicate) but not really a dupe since it is followed by a special character and then by itself BUT with some prefix - here: 'PrevYear'. Means Position_Name is dynamic and could be any word (eg Profit, Sales, etc) but PrevYear will stay constant.

So how could I identify these lines where there's a position being mentioned twice with some math symbol in the middle (for now) and then capture those three elements since the plus could also be a / (divided by), a minus sign - or a multiply * as intended to be represented by [ |-|/|*] in my example.

PS: I do not mind programming this in two steps ... so first matching and then capturing - but still would need the regex to find these little gems (in hundreds of lines).

Elegantly finding dupes is not the problem eg via \b(\w ) \1\b but I have come to realize my capabilities are not sufficient for that combo.

Thanks on hints and support.

CodePudding user response:

You can use

\b(\w )\b\s*[- /*]\s*PrevYear\s*\1\b

See the regex demo. Details

  • \b - a word boundary
  • (\w ) - Group 1: one or more word chars
  • \b - a word boundary
  • \s*[- /*]\s* - a -, , / or * enclosed with zero or more whitespaces
  • PrevYear - a fixed word
  • \s* - zero or more whitespaces
  • \1 - same value as captured in Group 1
  • \b - a word boundary.
  • Related