Home > database >  Regex between digits and strings
Regex between digits and strings

Time:09-18

I am trying to extract all digits referring to teaching experience, which should be 8, 17, 7. I have tried (years.*?teaching.*?:.*?[0-9] |\d . teaching) but it grabs everything from the first digit because of the second condition.

Sample text:

10 years small business ownership, 10 years sme consulting, 10 years corporate/vocational business training, 8 years teaching experience, years of teaching experience: 17, 7 years teaching/Corporate Training experience

CodePudding user response:

Keeping your regex as it is, it would be nice to approach it in a different way. I would rather break things apart and then try the regex on smaller string instead.

import re

input = '10  years small business ownership, 10  years sme consulting, 10  years corporate/vocational business training, 8 years teaching experience, years of teaching experience: 17, 7  years teaching/Corporate Training experience'

regex = re.compile('(years.*?teaching.*?:.*?[0-9] |\d . teaching)');

lines = input.split(',')
filteredLines = filter(lambda line: 'teaching' in line, lines)
experiences = map(lambda line: regex.match(line.strip()).group(), filteredLines);

print(list(experiences))

You could further modify this to fit your needs.

CodePudding user response:

With the following assumptions:

  • Each comma separated substring contains not more than one number
  • teaching is always related to experience years in separated substrings

I would put together this pattern and use with re.findall

re.findall(r"(?i)(?:,|^)(?=[^,]*?teaching)[^\d,]*(\d \ ?)", s)
  • (?:,|^) Starting point is either ^ start of string or a comma
  • (?=[^,]*?teaching) Condition to check if teaching occures before next ,
  • On success [^\d,]*(\d \ ?) capture the number and optional to the first group

See this demo at regex101 (more info on right side) or a Python demo at tio.run

  • Related