Text:
3. MANAGEMENT, FOOD EMPLOYEE Comments 234: FOUND NO EMPLOYEE ISSUED. | 5. PROCEDURES FOR RESPONDING TO VOMITING AND DIARRHEAL EVENTS - Comments: | 10. ADEQUATE HANDWASHING SINKS 7-38-030(C), NO CITATION ISSUED. | 47. FOOD & NON-FOOD
Background: inputs are separated by |
, the goal is to find the first number and all numbers after punctuation |
The preferred outcome is = [3,5,10,47]
Notice: avoiding 234,7-38-030
CodePudding user response:
def multi_re_find(patterns,phrase):
'''
Takes in a list of regex patterns
Prints a list of all matches
'''
for pattern in patterns:
print ('Searching the phrase using the re check: %r' %pattern)
print (re.findall(pattern,phrase))
print ('\n')
test_patterns = [r'\W ''\d ']
multi_re_find(test_patterns,text)
Outcome:[' 234', '. | 5', ': | 10', ' 7', '-38', '-030', '. | 47']
My approach missed number 3 and miss-included 234,7,-38,-30 into the result.
CodePudding user response:
You can use a regex with a lookahead:
s = '3. MANAGEMENT, FOOD EMPLOYEE Comments 234: FOUND NO EMPLOYEE ISSUED. | 5. PROCEDURES FOR RESPONDING TO VOMITING AND DIARRHEAL EVENTS - Comments: | 10. ADEQUATE HANDWASHING SINKS 7-38-030(C), NO CITATION ISSUED. | 47. FOOD & NON-FOOD'
import re
re.findall('\d (?=\.)', s)
Or to ensure matching the beginning of the line or after a |
you can also add a lookbehind:
re.findall('(?:(?<=^)|(?<=\| ))\d (?=\.)', s)
Output:
['3', '5', '10', '47']
And to get a list of integers:
list(map(int, re.findall('\d (?=\.)', s)))