Home > Software engineering >  Getting text between one or more pairs of strings using regex in python
Getting text between one or more pairs of strings using regex in python

Time:09-23

I'm trying to get a string between one or more pairs of string. For example,

import re
string1 = 'oi sdfdsf a'
string2 = 'biu serdfd e'
pattern = '(oi|biu)(.*?)(a|e)'
substring = re.search(pattern, string1).group(1)

In this case I should get: "sdfdsf" if I use string1 and "serdfd" if I use string2 in the search funnction. Instead I'm getting "oi" or "biu"

CodePudding user response:

If you use string in parentheses, regex will capture your string. If you want capture some strings but not match of them, you should add '(?:)' expressions.

You can just changed your pattern as below.

pattern = '(?:oi|biu)[ /t] ([\w*] )[ /t] (?:a|e)'

CodePudding user response:

You are placing capture groups around parts of your regex pattern which you don't really want to capture. Consider this version:

inp = ['oi sdfdsf a', 'biu serdfd e']
for i in inp:
    word = re.findall(r'\b(?:oi|biu) (\S ) (?:a|e)\b', i)[0]
    print(i   ' => '   word)

Here we turn off the capture groups on the surrounding words on the left and right, and instead use a single capture group around the term you want to capture. This prints:

oi sdfdsf a => sdfdsf
biu serdfd e => serdfd
  • Related