How do I get the text between the end of the question (starting after ?) and the text before the next question that starts with "Question"?
They answers are separated by new lines
import re
text = "Which feature is not part of the linux system?
pipe
2) dirx
ls
ps
Question 2 ("
output= re.findall(r'\?\s*(.*?)\s*Question\)', splitext).split('\n')
print(output)
CodePudding user response:
You may use this regex to match required text between ?
and Question
:
(?s)(?<=\?). ?(?=\nQuestion )
Explanation:
(?s)
: Enable DOTALL mode to make sure.
matched line break also(?<=\?)
: Lookbehind to assert that we have?
just before the current position. ?
: Match 1 of any characters including line breaks(?=\nQuestion )
: Lookahead to assert that we have a line break followed byQuestion
ahead of the current position
CodePudding user response:
You might use a capture group, matching lines in between that do not end on a question mark and do not start with Question
^.*\?((?:\n(?!.*\?$|Question\b).*) )
^
Start of string.*\?
Match a line ending on?
(
capture group 1 (which will be returned by re.findall)(?:
Non capture group to repeat as a whole\n(?!.*\?$|Question\b)
Match a newline, and assert that the line does not ends with?
or starts with Question.*
If the assertions are true, match the whole line
)*
Close the non capture group and optionally repeat
)
Close group 1
For example
import re
text = ("Which feature is not part of the linux system?\n"
"pipe\n"
"2) dirx\n"
"ls\n"
"ps\n\n"
"Question 2 (")
output = re.findall(r'^.*\?((?:\n(?!.*\?$|Question\b).*)*)', text)
print(output)
Output
['\npipe\n2) dirx\nls\nps\n']