Looking to build some pattern using my_string
as starting point to extract from my_string till Inc. i.e. Papa Johns some_text Inc
my_string_1 = 'Papa Johns'
my_string_1 = 'Inc.'
Need to search in any of the below sentences.
sent_1 = The company Papa Johns Retail Chain Inc. sells pizza, pastas etc.
sent_2 = The company Papa Johns Retail Chain., Inc. sells pizza, pastas etc.
sent_3 = The company Papa Johns Retail Chain, Inc. sells pizza, pastas etc.
sent_4 = The company Papa Johns Retail., Chain, Inc. sells pizza, pastas etc.
sent_5 = The company Papa Johns Retail, Inc. sells pizza, pastas etc.
I built a pattern pattern = '''Papa Johns (.{,30})Inc.'''
and also this is working fine.
Is this possible if I do not use 30 chars condition but use 2 words limit (may be space split) to extract the required for all sentences.
CodePudding user response:
You could use the pattern:
\bPapa Johns(?: \S ){0,2} Inc\.
This matches Papa Johns ... Inc.
with at most 2 words in between.
Python script:
inp = ["The company Papa Johns Retail Chain Inc. sells pizza, pastas etc.", "The company Papa Johns New Retail Chain Inc. sells pizza, pastas etc."]
for i in inp:
if re.search(r'\bPapa Johns(?: \S ){0,2} Inc\.', i):
print("MATCH: " i)
else:
print("NO MATCH: " i)
This prints:
MATCH: The company Papa Johns Retail Chain Inc. sells pizza, pastas etc.
NO MATCH: The company Papa Johns New Retail Chain Inc. sells pizza, pastas etc.