Home > Blockchain >  Regex search pattern to extract string with 2 words limit
Regex search pattern to extract string with 2 words limit

Time:02-11

Looking to build some pattern using my_string as starting point to extract from my_string till Inc. i.e. Papa Johns some_text Inc

my_string_1 = 'Papa Johns'

my_string_1 = 'Inc.'

Need to search in any of the below sentences.

sent_1 = The company Papa Johns Retail Chain Inc. sells pizza, pastas etc.

sent_2 = The company Papa Johns Retail Chain., Inc. sells pizza, pastas etc.

sent_3 = The company Papa Johns Retail Chain, Inc. sells pizza, pastas etc.

sent_4 = The company Papa Johns Retail., Chain, Inc. sells pizza, pastas etc.

sent_5 = The company Papa Johns Retail, Inc. sells pizza, pastas etc.

I built a pattern pattern = '''Papa Johns (.{,30})Inc.''' and also this is working fine.

Is this possible if I do not use 30 chars condition but use 2 words limit (may be space split) to extract the required for all sentences.

CodePudding user response:

You could use the pattern:

\bPapa Johns(?: \S ){0,2} Inc\.

This matches Papa Johns ... Inc. with at most 2 words in between.

Python script:

inp = ["The company Papa Johns Retail Chain Inc. sells pizza, pastas etc.", "The company Papa Johns New Retail Chain Inc. sells pizza, pastas etc."]
for i in inp:
    if re.search(r'\bPapa Johns(?: \S ){0,2} Inc\.', i):
        print("MATCH:    "   i)
    else:
        print("NO MATCH: "   i)

This prints:

MATCH:    The company Papa Johns Retail Chain Inc. sells pizza, pastas etc.
NO MATCH: The company Papa Johns New Retail Chain Inc. sells pizza, pastas etc.
  • Related