right now I have a test.txt file that im reading in. it has several new line characters so I am using re.DOTALL. How can I combine subsequent patterns into pairs?
test.txt:
blah blah blah||| blah blah||
Key_one1_end || blah blah
blah blah || blah
blah blah |||||| blah blah Value_number : 10
blah blah blah||| blah blah||
Key_two2_end || blah blah
blah blah || blah
Value_number : f
This is my code
f = open(r'path/to/file/test.txt')
list= re.findall('(Key_\w*_end)|(Value_number...\w*)', f.read(), re.DOTALL)
print (list)
output: [('Key_one1_end', ''), ('', 'Value_number : 10'), ('Key_two2_end', ''), ('', 'Value_number : f')]
I want the output to look like this
[('Key_one1_end','Value_number : 10'), ('Key_two2_end', 'Value_number : f')]
any suggestions?
CodePudding user response:
pattern1|pattern2
matches either of the patterns, so each match in the list will just contain one of those matches.
If you want to combine them in a single match, don't use an alternative. Use a wildcard to match the text between the two patterns.
list= re.findall('(Key_\w*_end).*?(Value_number...\w*)', f.read(), re.DOTALL)