Home > Mobile >  Year and Text Parsing with Regex
Year and Text Parsing with Regex

Time:03-08

I am trying to extract the data in the format of either Apr 2022 - Present (the text after until the next date) or Apr 1874 - Dec 1958 (the text after until the next date) from a text for later processing with NLP.

Example:

Apr 2018 - Present lm Senior NurseWoodfield Hospital, Ipswich© Provided daily care for 6 elderly patients after major surgical procedures inan ICU unit by monitoring vital signs and administering medication© Collaborated with doctors to develop long-term care plans after hospitalstays.Supervised 4 Certified Nursing Assistants (CNAs) working in the unitFeb 2014 - Mar 2018 mm Registered NurseAshfield Care Home, Kent© Worked with the unit manager to take care of 36 frail and elderly patientswith complex health needs.© Responsible for administering medicine safely, in accordance with theNursing Midwifery Council guidelines.© Managed the unit's revenue and budget, including the allocation of funds forpatient care, equipment, and staff supplies.Nov 2043 - Jan 2014 lm Healthcare AssistantChase Care Home, Suffolk« Responsible for the safety and well-being of elderly people with dementiaand challenging behaviour.Worked with palliative care teams to help deliver end of life care to patients.

Expected result: ["Apr 2018 - Present lm Senior NurseWoodfield Hospital, Ipswich© Provided daily care for 6 elderly patients after major surgical procedures inan ICU unit by monitoring vital signs and administering medication© Collaborated with doctors to develop long-term care plans after hospitalstays.Supervised 4 Certified Nursing Assistants (CNAs) working in the unitFeb", "Feb 2014 - Mar 2018 mm Registered NurseAshfield Care Home, Kent© Worked with the unit manager to take care of 36 frail and elderly patientswith complex health needs.© Responsible for administering medicine safely, in accordance with theNursing Midwifery Council guidelines.© Managed the unit's revenue and budget, including the allocation of funds forpatient care, equipment, and staff supplies", "Nov 2043 - Jan 2014 lm Healthcare AssistantChase Care Home, Suffolk« Responsible for the safety and well-being of elderly people with dementiaand challenging behaviour.Worked with palliative care teams to help deliver end of life care to patients."]

This is the code I wrote. I am having troubles fixing it

year_pattern = re.compile(r"((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[.]?[\s-]\d{4}) - (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[.]?[\s-]\d{4} | (Present |present")

year = ''.join(year_pattern.findall(text)).strip()

CodePudding user response:

Probably not the best looking solution but this worked for me:

pattern = "(?=Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|present|Present [0-9]{4} -)"
splitted = re.split(pattern, text)
print(splitted)
result = []
i = 0
while i < len(splitted):
  result.append(splitted[i]   splitted[i   1])
  i  = 2

print(result)

CodePudding user response:

Would you please try the following:

pat = re.compile(r"((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?[\s-]\d{4} - (?:(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?[\s-]\d{4}|[Pp]resent))")

m = pat.split(text)
print([m[i]   m[i 1] for i in range(1, len(m), 2)])
  • Related