I'm using
\b(small|medium|large)(?:\W \w ){1,6}?\W (cheese|pepperoni|sausage)\b
Found here: https://www.regular-expressions.info/near.html
But I'd like to know if there is a way to use reverse order (finds any of the words regardless of order)
CodePudding user response:
We can use 2 regex patterns with |
which means OR. Can therefore look for our words in either order.
I have allowed for the first letter of each word to be upper or lower case and for there to be 1 or 2 words between the key words. This makes for a long pattern.
(\b([Ss]mall|[Mm]edium|[Ll]arge)\b(\W \w{1,10}){0,2}(\W )\b([Cc]heese|[Pp]epperoni|[Ss]ausage)\b)|(\b([Cc]heese|[Pp]epperoni|[Ss]ausage)\b(\W \w{1,10}){0,2}(\W )\b([Ss]mall|[Mm]edium|[Ll]arge)\b)
CodePudding user response:
We can include a maximum of 25 characters before and after the pattern looking for the first 3 words and only accept the string if the 1 of the next three words are found in the group returned.
import re
strings = ['Large pizza with Pepperoni','Pepperoni pizza size Large','Large but not food','cheese no size','small mac cheese']
for s in strings:
f1 = re.findall(r'(.{0,25}\b(small|medium|large)\b.{0,25})',s, re.IGNORECASE)
if f1:
f2 = re.findall(r'(.{0,25}\b(cheese|pepperoni|sausage)\b.{0,25})',f1[0][0], re.IGNORECASE)
if f2:
print('<',s,'> contains <',f1[0][1],'> and <',f2[0][1],'>')
output
< Large pizza with Pepperoni > contains < Large > and < Pepperoni >
< Pepperoni pizza size Large > contains < Large > and < Pepperoni >
< small mac cheese > contains < small > and < cheese >