write a regular expression to extract a word that follows a pattern-CodePudding

I have a column that has data as below. From this I need to extract the 1 word that comes after approved by,Approver is, Approval from etc. The first 1 word/name that comes after keyword "approv". approv should be case insensitive.

Ex -

row 1- incident 12345, issue is so and so, solution is so and so.Ticket was approved by thors

row 2-incident 12900, issue is so and so, solution is so and so. approver is : Wanda Advocate worked julie

row 3-incident 125790, issue is so and so, solution is so and so. Ticket was got approval from- spiderman, closing

row 4-incident 125790, issue is so and so, solution is so and so. Ticket was approved by- ironman, blah blah

I tried to do \bApprov*\b([\w][A-Za-z]{4-7}) - but its not working

CodePudding user response：

It is a solution quite similar to yours, I hope it works for you. At least for this particular case it returns the output you needed:

import regex as re

string = """row 1- incident 12345, issue is so and so, solution is so and so.Ticket was approved by thors

row 2-incident 12900, issue is so and so, solution is so and so. approver is : Wanda Advocate worked julie

row 3-incident 125790, issue is so and so, solution is so and so. Ticket was got approval from- spiderman, closing

row 4-incident 125790, issue is so and so, solution is so and so. Ticket was approved by- ironman, blah blah"""

for row in string.split("\n"):
    if row.startswith("row"):
        m = re.search(r"(?i)(?<=approv[A-Z\s\-\:] )[A-Z]{5,}", row)
        print(m.group(0))

Output:

thors
Wanda
spiderman
ironman

CodePudding user response：

Do you want to achieve this using python? If so the code below may be helpfull.

code:

    rows = ['incident 12345, issue is so and so, solution is so and so.Ticket was approved by Thors'
            , 'incident 12900, issue is so and so, solution is so and so. approver is : Wanda Advocate worked julie'
            , 'incident 125790, issue is so and so, solution is so and so. Ticket was got approval from- spiderman, closing'
            , 'incident 125790, issue is so and so, solution is so and so. Ticket was approved by- ironman, blah blah']

    for row in rows:
        clean_row = row.translate({ord(x): None for x in ',.;:[]()-'})
        split_row = clean_row.lower().split('approv')[-1].split()[2]
        print(split_row)

output:

    thors
    wanda
    spiderman
    ironman