I have a column that has data as below. From this I need to extract the 1 word that comes after approved by,Approver is, Approval from etc. The first 1 word/name that comes after keyword "approv". approv should be case insensitive.
Ex -
row 1- incident 12345, issue is so and so, solution is so and so.Ticket was approved by thors
row 2-incident 12900, issue is so and so, solution is so and so. approver is : Wanda Advocate worked julie
row 3-incident 125790, issue is so and so, solution is so and so. Ticket was got approval from- spiderman, closing
row 4-incident 125790, issue is so and so, solution is so and so. Ticket was approved by- ironman, blah blah
I tried to do \bApprov*\b([\w][A-Za-z]{4-7}) - but its not working
CodePudding user response:
It is a solution quite similar to yours, I hope it works for you. At least for this particular case it returns the output you needed:
import regex as re
string = """row 1- incident 12345, issue is so and so, solution is so and so.Ticket was approved by thors
row 2-incident 12900, issue is so and so, solution is so and so. approver is : Wanda Advocate worked julie
row 3-incident 125790, issue is so and so, solution is so and so. Ticket was got approval from- spiderman, closing
row 4-incident 125790, issue is so and so, solution is so and so. Ticket was approved by- ironman, blah blah"""
for row in string.split("\n"):
if row.startswith("row"):
m = re.search(r"(?i)(?<=approv[A-Z\s\-\:] )[A-Z]{5,}", row)
print(m.group(0))
Output:
thors
Wanda
spiderman
ironman
CodePudding user response:
Do you want to achieve this using python? If so the code below may be helpfull.
code:
rows = ['incident 12345, issue is so and so, solution is so and so.Ticket was approved by Thors'
, 'incident 12900, issue is so and so, solution is so and so. approver is : Wanda Advocate worked julie'
, 'incident 125790, issue is so and so, solution is so and so. Ticket was got approval from- spiderman, closing'
, 'incident 125790, issue is so and so, solution is so and so. Ticket was approved by- ironman, blah blah']
for row in rows:
clean_row = row.translate({ord(x): None for x in ',.;:[]()-'})
split_row = clean_row.lower().split('approv')[-1].split()[2]
print(split_row)
output:
thors
wanda
spiderman
ironman