i do have a below text which was extracted using pdfminer
.
Output from pdfminer :
Work Experience -Job Role/Establishment -Sales Assistant @ DFSDuration -June 2021 - PresentCurrently at DFS I work as a sales assistant, My role entails me helping customers withproduct enquiries and assisting customers needs,whilst also dealing with difficult/frustrated customers. At DFS a lot can go wrong so it’s essential to be able to deal withmany different types of objections and show understanding towards the customer at alltimes for their needs to be met. Within the role I am expected to achieve sales targetswhich I currently have no problems reaching.Job Role/Establishment -Plasterer @ MB Plasterer’sDuration -September 2016 - PresentWhilst working as a plasterer I have been able to develop my practical trading skills.Areas Of Expertise -●Customer Interaction●Customer Service●Resilience●Rapport Building●Trader●Warehouse WorkPersonal Skills -●Friendly●Confident●Articulate●Self Motivated●Punctual
Expected Output :
Work Experience -Job Role/Establishment -Sales Assistant @ DFSDuration -June 2021 - PresentCurrently at DFS I work as a sales assistant, My role entails me helping customers withproduct enquiries and assisting customers needs,whilst also dealing with difficult/frustrated customers. At DFS a lot can go wrong so it’s essential to be able to deal withmany different types of objections and show understanding towards the customer at alltimes for their needs to be met. Within the role I am expected to achieve sales targetswhich I currently have no problems reaching.Job Role/Establishment -Plasterer @ MB Plasterer’sDuration -September 2016 - PresentWhilst working as a plasterer I have been able to develop my practical trading skills.
in some text Work Experience are indicated with other terms such as EXPERIENCE , Job Experience and such other.
I am looking to write a generic regex logic to fetch the text between the Work Experience and Areas Of Expertise.
The pattern i tried is below :
pattern = r'^(?:EXPERIENCE|Employment experience|Work Experience|Work Experience|WORK EXPERIENCE|Previous Employment|Work Experience -|Job experience|)\s*(\S.*?)\n(?:Skills|EDUCATION|Education|SKILLS|Areas Of Expertise)'
matches = re.findall(pattern, text, re.M | re.S)
print(matches)
but i am getting output as []
what is missed..? how it can be approached..?
CodePudding user response:
Try this pattern:
pattern = r'^(?:EXPERIENCE|Employment experience|Work Experience|Work Experience|WORK EXPERIENCE|Previous Employment|Work Experience -|Job experience)\s*(. ?)\n(?:Areas Of Expertise|SKILLS|Education|EDUCATION)'
CodePudding user response:
The first part of your regex you can use:
^(?:EXPERIENCE|Employment experience|Work Experience|Work Experience|WORK EXPERIENCE|Previous Employment|Work Experience -|Job experience)
After that you want everything in between (.*)
up until a specific substring (?=Areas Of Expertise)
.
Total:
^(?:EXPERIENCE|Employment experience|Work Experience|Work Experience|WORK EXPERIENCE|Previous Employment|Work Experience -|Job experience)(.*)(?=Areas Of Expertise)
Where (?=)
means to look ahead and exclude the keywords you are looking for, as explained here Regex Match all characters between two strings