I am trying to extract all digits referring to teaching experience, which should be 8, 17, 7. I have tried (years.*?teaching.*?:.*?[0-9] |\d . teaching)
but it grabs everything from the first digit because of the second condition.
Sample text:
10 years small business ownership, 10 years sme consulting, 10 years corporate/vocational business training, 8 years teaching experience, years of teaching experience: 17, 7 years teaching/Corporate Training experience
CodePudding user response:
Keeping your regex as it is, it would be nice to approach it in a different way. I would rather break things apart and then try the regex on smaller string instead.
import re
input = '10 years small business ownership, 10 years sme consulting, 10 years corporate/vocational business training, 8 years teaching experience, years of teaching experience: 17, 7 years teaching/Corporate Training experience'
regex = re.compile('(years.*?teaching.*?:.*?[0-9] |\d . teaching)');
lines = input.split(',')
filteredLines = filter(lambda line: 'teaching' in line, lines)
experiences = map(lambda line: regex.match(line.strip()).group(), filteredLines);
print(list(experiences))
You could further modify this to fit your needs.
CodePudding user response:
With the following assumptions:
- Each comma separated substring contains not more than one number
teaching
is always related to experience years in separated substrings
I would put together this pattern and use with re.findall
re.findall(r"(?i)(?:,|^)(?=[^,]*?teaching)[^\d,]*(\d \ ?)", s)
(?:,|^)
Starting point is either^
start of string or a comma(?=[^,]*?teaching)
Condition to check ifteaching
occures before next,
- On success
[^\d,]*(\d \ ?)
capture the number and optional
See this demo at regex101 (more info on right side) or a Python demo at tio.run