ptx
captures most of what i want. Because i am incompetent at combining many things into one regex) i created a second ptx1
regex that should capture the following character sequences ADDITIONALLY:
One Department
, One foreign Department
, Two office
text_list = ' '.join(map(str, text))
ptx = re.compile(r'(\s something(?:\s |\\n)*patternx:)(.*)(One\s foreign)', flags = re.DOTALL | re.MULTILINE)
ptx1 = re.compile(r'(\s something(?:\s |\\n)*patternx:)(.*)((One|Two)\s (?:foreign\s )*Department|office)', flags = re.DOTALL | re.MULTILINE)
ten = ptx.search(text_list)
eleven = ptx1.search(text_list)
try:
if ten:
ten = ten.group(2)
else:
ten = None
except:
pass
here is what i added before else
above: It didnt work.
elif:
ten = eleven.group(2)
My question is: How do i need to call the group on the elif
statement in order to get the (.*)
or text_i_want
content returned? I have the gut feeling that i need to access the eleven
as if it were a list because it has so many capturing groups by eleven[0].group(1)
in order to get first element from the list and get its second group. But that didnt work either.
You can think of text_list
like this
text_list = ['...something\npatternx: text_i_want One Department',
'...something patternx: text_i_want One foreign Department',
'...something\n patternx: text_i_want Two office']
CodePudding user response:
It looks as if you got tricked when factoring in the alternatives on the right hand side.
You need to use
\bsomething\s patternx:(.*?)\b(?:One\s foreign|One\s Department|One\s foreign\s Department|Two\s office)\b
which can be shortened as
\bsomething\s patternx:(.*?)\b(?:One\s (?:Department|foreign(?:\s Department)?)|Two\s office)\b
See the regex demo. Details:
\bsomething\s patternx:
- whole wordsomething
, one or more whitespaces,patternx:
string(.*?)
- Group 1: any zero or more chars as few as possible\b(?:One\s (?:Department|foreign(?:\s Department)?)|Two\s office)\b
- eitherOne Department
,One foreign
,One foreign Department
, orTwo office
as whole words.
See the Python demo:
import re
text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
text_list = ' '.join(map(str, text_list))
rx = r'\bsomething\s patternx:(.*?)\b(?:One\s (?:Department|foreign(?:\s Department)?)|Two\s office)\b'
print(re.findall(rx, text_list, re.DOTALL))
# => [' text_i_want ', ' text_i_want ', ' text_i_want ']