I am trying to extract the data in the format of either Apr 2022 - Present (the text after until the next date)
or Apr 1874 - Dec 1958 (the text after until the next date)
from a text for later processing with NLP.
Example:
Apr 2018 - Present lm Senior NurseWoodfield Hospital, Ipswich© Provided daily care for 6 elderly patients after major surgical procedures inan ICU unit by monitoring vital signs and administering medication© Collaborated with doctors to develop long-term care plans after hospitalstays.Supervised 4 Certified Nursing Assistants (CNAs) working in the unitFeb 2014 - Mar 2018 mm Registered NurseAshfield Care Home, Kent© Worked with the unit manager to take care of 36 frail and elderly patientswith complex health needs.© Responsible for administering medicine safely, in accordance with theNursing Midwifery Council guidelines.© Managed the unit's revenue and budget, including the allocation of funds forpatient care, equipment, and staff supplies.Nov 2043 - Jan 2014 lm Healthcare AssistantChase Care Home, Suffolk« Responsible for the safety and well-being of elderly people with dementiaand challenging behaviour.Worked with palliative care teams to help deliver end of life care to patients.
Expected result: ["Apr 2018 - Present lm Senior NurseWoodfield Hospital, Ipswich© Provided daily care for 6 elderly patients after major surgical procedures inan ICU unit by monitoring vital signs and administering medication© Collaborated with doctors to develop long-term care plans after hospitalstays.Supervised 4 Certified Nursing Assistants (CNAs) working in the unitFeb", "Feb 2014 - Mar 2018 mm Registered NurseAshfield Care Home, Kent© Worked with the unit manager to take care of 36 frail and elderly patientswith complex health needs.© Responsible for administering medicine safely, in accordance with theNursing Midwifery Council guidelines.© Managed the unit's revenue and budget, including the allocation of funds forpatient care, equipment, and staff supplies", "Nov 2043 - Jan 2014 lm Healthcare AssistantChase Care Home, Suffolk« Responsible for the safety and well-being of elderly people with dementiaand challenging behaviour.Worked with palliative care teams to help deliver end of life care to patients."]
This is the code I wrote. I am having troubles fixing it
year_pattern = re.compile(r"((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[.]?[\s-]\d{4}) - (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[.]?[\s-]\d{4} | (Present |present")
year = ''.join(year_pattern.findall(text)).strip()
CodePudding user response:
Probably not the best looking solution but this worked for me:
pattern = "(?=Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|present|Present [0-9]{4} -)"
splitted = re.split(pattern, text)
print(splitted)
result = []
i = 0
while i < len(splitted):
result.append(splitted[i] splitted[i 1])
i = 2
print(result)
CodePudding user response:
Would you please try the following:
pat = re.compile(r"((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?[\s-]\d{4} - (?:(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?[\s-]\d{4}|[Pp]resent))")
m = pat.split(text)
print([m[i] m[i 1] for i in range(1, len(m), 2)])