Home > Mobile >  Extract a name after certain string
Extract a name after certain string

Time:08-18

I am quite new to Python (3.9) but with everything available online I thought I might be able to solve a problem.

I am trying to extract a person's name from an invoice, which may be 2-3 consecutive words at any length and may rarely contain a hyphen.

Phone: (111) 311-1111
Desired Name:   Friday twk-test Date of Birth:   01/01/1988

Here is what I have so far:

(?<=Desired Name:\s{3}[A-Za-z])[A-Za-z] \s[A-Za-z] 

Match:

riday twk

The output needs to be:

Friday twk-test

CodePudding user response:

You can use

\bDesired Name:\s*([^\W\d_] (?:[\s-] [^\W\d_] ){1,2})

See the regex demo.

Details:

  • \b - a word boundary
  • Desired Name: - a literal string
  • \s* - zero or more whitespaces
  • ([^\W\d_] (?:[\s-] [^\W\d_] ){1,2}) - Group 1: two or three words consisting of only Unicode letters that are separated with one or more whitespaces or hyphens:
    • [^\W\d_] - one or more Unicode letters
    • (?:[\s-] [^\W\d_] ){1,2} - one or two sequences of:
      • [\s-] - one or more whitespaces or - chars
      • [^\W\d_] - one or more Unicode letters.

If there can be a single whitespace or hyphen, remove after [\s-].

See a Python demo:

import re
text="Phone: (111) 311-1111\nDesired Name:   Friday twk-test Date of Birth:   01/01/1988"
pattern=r"\bDesired Name:\s*([^\W\d_] (?:[\s-] [^\W\d_] ){1,2})"
match = re.search(pattern, text)
if match:
    print(match.group(1))
# => Friday twk-test

CodePudding user response:

Assuming that all of your invoices follow this same structure, then you can use this regex:

\bDesired Name:\s*([A-Za-z\s\-] ?(?=\s Date of Birth))

A demo is here: regex101 demo

What this does is:

  • \b: Word boundary
  • Desired Name:: string to match that we know is before the name
  • \s*: match zero or more whitespaces
  • ([A-Za-z\s\-] ?(?=\s Date of Birth)): A capturing group to match the name
    • [A-Za-z\s\-] : match any letter (either upper or lower case), as well as whitespace and hyphens.
    • ?(?=\s Date of Birth): positive lookahead, so it will match everything up until this string.

What this means is that if someone's first name and last name both have a hyphen, and they also have another name, the entire name will be captured.

  • Related