Home > Software design >  Get all substrings between two markers containing a keyword
Get all substrings between two markers containing a keyword

Time:01-04

If i have a string like

b*&^6bolyb{[--9_(marker1JN9&[7&9bkey=- )*.,mljmarker2,pi*[80[)(Mmp0oiymarker1ojm)*[marker2,;i0m980m.9u090marker1*(7hp0Key0()mu90marker2

how do i extract the part between marker1 and marker2 if it contains key (or 'Key' or any other variation in case) ?

So i'd like to have the code return:

['JN9&[7&9bkey=- )*.,mlj', '*(7hp0Key0()mu90']

CodePudding user response:

We can use re.findall here:

inp = "b*&^6bolyb{[--9_(marker1JN9&[7&9bkey=- )*.,mljmarker2,pi*[80[)(Mmp0oiymarker1ojm)*[marker2,;i0m980m.9u090marker1*(7hp0key0()mu90marker2"
matches = re.findall(r'marker1(?:(?!marker[12]).)*[kK]ey(?:(?!marker[12]).)*marker2', inp)
print(matches)  # ['marker1JN9&[7&9bkey=- )*.,mljmarker2', 'marker1*(7hp0key0()mu90marker2']

The regex pattern used above ensures that we match a marker1 ... key ... marker2 sequence without crossing over more than one marker1 or marker2 boundary:

  • marker1 match "marker1"
  • (?:(?!marker[12]).)* match any content WITHOUT crossing a "boundary1" or "boundary2" marker
  • [kK]ey match "key" or "Key"
  • (?:(?!marker[12]).)* again match without crossing a marker
  • marker2 match "marker2"
  • Related