I'm having trouble creating a regex expression supported by python to handle this use case.
Imagine you have a text string that is a set of questions and multiple choice answers:
Question 1: What witch-like attributes do you have?
Answer 1:
x Hat
o Pointy Nose
x Float
x Weigh more than a duck
Question 2: Where could this coconut have come from?
Answer 2:
o It migrated
x A European swallow carried it
o An African swallow carried it
x It doesn't matter
... and you would like to parse the above text for only the "x" answers to Question 1 using Regex.
If you had access to PCRE you could do something like this using the \G (last match) anchor:
(?:\G(?!^)|Question 1:)(?:(?!Question 1:|Question 2:)[\s\S])*?\K(?:x\s)([a-z] )(?=(?:(?!Question 1:)[\s\S])*Question 2:)
...or maybe even something fun using subroutines (e.g., (textbetweentokens)(?1)(textwithx)
.
But python doesn't support either of those regex features.
Is there any other way to solving this regex challenge?
Note: There are other questions like this on stackoverflow, but none that I could find that had answers that were usable with python-supported regex.
CodePudding user response:
You have to split your text to line to use str.startswith()
texte = """Question 1: What witch-like attributes do you have?
Answer 1:
x Hat
o Pointy Nose
x Float
x Weigh more than a duck
Question 2: Where could this coconut have come from?
Answer 2:
o It migrated
x A European swallow carried it
o An African swallow carried it
x It doesn't matter"""
lines = texte.splitlines()
for l in lines:
if l.startswith('x'):
print(l)
Output:
x Hat
x Float
x Weigh more than a duck
x A European swallow carried it
x It doesn't matter
CodePudding user response:
You could match each line that starts with "x" but include a look-ahead assertion that checks that the next question is question 2:
(?:^x\s)(.*)(?=\s (?:^(?!Question).*\s )*^Question 2)
Use the re.M
flag so ^
matches with the start of a line.
This assumes of course that the question that precedes question 2 is question 1.
import re
s = """Question 1: What witch-like attributes do you have?
Answer 1:
x Hat
o Pointy Nose
x Float
x Weigh more than a duck
Question 2: Where could this coconut have come from?
Answer 2:
o It migrated
x A European swallow carried it
o An African swallow carried it
x It doesn't matter
"""
answers = re.findall(r"(?:^x\s)(.*)(?=\s (?:^(?!Question).*\s )*^Question 2)", s, re.M)
print(answers)
Output:
['Hat', 'Float', 'Weigh more than a duck']