Home > Net >  Python Regex: Match text between two keywords when the text contain a specific word
Python Regex: Match text between two keywords when the text contain a specific word

Time:11-07

I'm looking for a Regex to match the whole text of every sentence between the OR operators that contains one or more ANDs, if and only if, one or more ANDs is in the sentence between two ORs. For instance:

this should match

OR "Message:\"An Arm and a Leg \<Meaning\>: Something that is extremely expensive.\"" AND "Message:\"Jaws of Death \<Meaning\>: Being in a dangerous or very deadly situation.\"" OR

OR "Message:\"Know the Ropes \<Meaning\>: Having a familiarity or understanding of how something works.\"" AND "Message:\"Poke Fun At \<Meaning\>: Making fun of something or someone; ridicule.\"" AND "Message:\"Give a Man a Fish \<Meaning\>: It's better to teach a person how to do something than to do that something for them.\"" AND "Message:\"Money Doesn't Grow On Trees \<Meaning\>: Suggests that money is a resource that must be earned and is not one that's easily acquired.\"" AND "Message:\"There's No I in Team \<Meaning\>: To not work alone, but rather, together with others in order to achieve a certain goal.\"" AND "Message:\"A Busy Bee \<Meaning\>: An industrious person.\"" AND "Message:\"Wake Up Call \<Meaning\>: An occurance of sorts that brings a problem to somebody's attention and they realize it needs fixing.\"" AND "Message:\"A Lot on One\'s Plate \<Meaning\>: A lot \(or too much\) to do or cope with.\"" AND "Message:\"Under the Weather \<Meaning\>: Not feeling well, in health or mood.\"" OR

This shouldn't match:

OR "Message:\"Break The Ice \<Meaning\>: Breaking down a social stiffness.\"" OR

this is a placeholder/random text to use as example:

"Message:\"Knock Your Socks Off \<Meaning\>: To be taken by surprise.\"" AND "Message:\"Playing For Keeps \<Meaning\>: Said when things are about to get serious.\"" OR "Message:\"Break The Ice \<Meaning\>: Breaking down a social stiffness.\"" OR "Message:\"Right Out of the Gate \<Meaning\>: Right from the beginning; to do something from the start.\"" OR "Message:\"Birds of a Feather Flock Together \<Meaning\>: People tend to associate with others who share similar interests or values.\"" AND "Message:\"Up In Arms \<Meaning\>: Angry; being roused to the point that you are ready to fight.\"" OR "Message:\"Know the Ropes \<Meaning\>: Having a familiarity or understanding of how something works.\"" AND "Message:\"Poke Fun At \<Meaning\>: Making fun of something or someone; ridicule.\"" AND "Message:\"Give a Man a Fish \<Meaning\>: It's better to teach a person how to do something than to do that something for them.\"" AND "Message:\"Money Doesn't Grow On Trees \<Meaning\>: Suggests that money is a resource that must be earned and is not one that's easily acquired.\"" AND "Message:\"There's No I in Team \<Meaning\>: To not work alone, but rather, together with others in order to achieve a certain goal.\"" AND "Message:\"A Busy Bee \<Meaning\>: An industrious person.\"" AND "Message:\"Wake Up Call \<Meaning\>: An occurance of sorts that brings a problem to somebody's attention and they realize it needs fixing.\"" AND "Message:\"A Lot on One\'s Plate \<Meaning\>: A lot \(or too much\) to do or cope with.\"" AND "Message:\"Under the Weather \<Meaning\>: Not feeling well, in health or mood.\"" OR "Message:\"A Day Late and a Dollar Short \<Meaning\>: Too late. A missed opportunity.\"" OR "Message:\"Back to Square One \<Meaning\>: To go back to the beginning; back to the drawing board.\"" OR "Message:\"An Arm and a Leg \<Meaning\>: Something that is extremely expensive.\"" AND "Message:\"Jaws of Death \<Meaning\>: Being in a dangerous or very deadly situation.\"" OR "Message:\"Barking Up The Wrong Tree \<Meaning\>: To make a wrong assumption about something.\"" OR "Message:\"Swinging For the Fences \<Meaning\>: Giving something your all.\"" OR "Message:\"Talk the Talk \<Meaning\>: Supporting what you say, not just with words, but also through action or evidence.\"" OR "Message:\"Back To the Drawing Board \<Meaning\>: Starting over again on a new design from a previously failed attempt.\"" OR "Message:\"On the Ropes \<Meaning\>: Being in a situation that looks to be hopeless!\"" OR "Message:\"Tug of War \<Meaning\>: It can refer to the popular rope pulling game or it can mean a struggle for authority.\"" AND "Message:\"A Dime a Dozen \<Meaning\>: Something that is extremely common.\"" AND "Message:\"In a Pickle \<Meaning\>: Being in a difficult predicament; a mess; an undesirable situation.\"" AND "Message:\"Ring Any Bells? \<Meaning\>: Recalling a memory; causing a person to remember something or someone.\"" AND "Message:\"When the Rubber Hits the Road \<Meaning\>: When something is about to begin, get serious, or put to the test.\"" AND "Message:\"Burst Your Bubble \<Meaning\>: To ruin someone's happy moment.\"" AND "Message:\"No Ifs, Ands, or Buts \<Meaning\>: Finishing a task without making any excuses.\"" AND "Message:\"Tough It Out \<Meaning\>: To remain resillient even in hard times; enduring.\"" OR "Message:\"Curiosity Killed The Cat \<Meaning\>: Typically said to indicate that any further investigation into a situation may lead to harm.\"" OR "Message:\"A Chip on Your Shoulder \<Meaning\>: Being angry about something that happened in the past.\"" OR "Message:\"A Cold Day in July \<Meaning\>: Something that is highly unlikely to happen.\"" OR "Message:\"Cry Over Spilt Milk \<Meaning\>: It's useless to worry about things that already happened and cannot be changed.\"" OR "Message:\"A Leg Up \<Meaning\>: Someone who's given an advantage over others.\"" OR "Message:\"It's Not Brain Surgery \<Meaning\>: A task that's easy to accomplish, a thing lacking complexity.\"" OR "Message:\"You Can't Judge a Book By Its Cover \<Meaning\>: Don't judge someone or something only by the outward appearance.\"" AND "Message:\"Down For The Count \<Meaning\>: Someone or something that looks to be defeated, or nearly so.\"" OR "Message:\"Yada Yada \<Meaning\>: A way to notify a person that what they're saying is predictable or boring.\"" AND "Message:\"Let Her Rip \<Meaning\>: Permission to start, or it could mean 'go faster!'\"" OR "Message:\"Wouldn't Harm a Fly \<Meaning\>: Nonviolent; someone who is mild or gentle.\"" OR "Message:\"Off One's Base \<Meaning\>: A person that is crazy or behaving in idiotic ways\"" AND "Message:\"Close But No Cigar \<Meaning\>: Coming close to a successful outcome only to fall short at the end.\"" AND "Message:\"It's Not All It's Cracked Up To Be \<Meaning\>: Failing to meet expectations; not being as good as people say.\"" AND "Message:\"What Am I, Chopped Liver? \<Meaning\>: A rhetorical question used by a person who feels they are being given less consideration than someone else.\"" AND "Message:\"A Dog in the Manger \<Meaning\>: Someone who prevents others from using valuable items even though they have no need for them.\"" AND "Message:\"A Bite at the Cherry \<Meaning\>: An opportunity that's not available to most people.\"" OR "Message:\"Don't Count Your Chickens Before They Hatch \<Meaning\>: Do not rely on something you are not sure of.\"

I'm using Positive lookbehind at the beginning and Positive lookahead at the end to set boundaries, i tried with (.?AND.?) to match any character between zero and unlimited times and as few times as possible. I tried with:

(?<=OR)(.*?AND.*?)(?=OR)

(?<=OR) (?:[\s\S])*? AND (?:[\s\S\w] ?)(?=OR)

They stop matching at the OR (after the AND), but the do not start matching at the first OR before the AND.

CodePudding user response:

If I understand you correctly, you want to search for one/or more AND between the OR:

(?<=OR)((?:(?!OR).) AND(?:(?!OR).) )(?=OR)

Regex demo.

CodePudding user response:

I suggest to use a simple regex for splitting by ' OR ' instead of a complex regex for searching to speed things up:

message = '''"Message:\"Knock Your Socks Off \<Meaning\>: To be taken by surprise.\"" AND "Message:\"Playing For Keeps \<Meaning\>: Said when things are about to get serious.\"" OR  "Message:\"Break The Ice \<Meaning\>: Breaking down a social stiffness.\"" OR "Message:\"Right Out of the Gate \<Meaning\>: Right from the beginning; to do something from the start.\"" OR "Message:\"Birds of a Feather Flock Together \<Meaning\>: People tend to associate with others who share similar interests or values.\"" AND "Message:\"Up In Arms \<Meaning\>: Angry; being roused to the point that you are ready to fight.\"" OR "Message:\"Know the Ropes \<Meaning\>: Having a familiarity or understanding of how something works.\"" AND "Message:\"Poke Fun At \<Meaning\>: Making fun of something or someone; ridicule.\"" AND "Message:\"Give a Man a Fish \<Meaning\>: It's better to teach a person how to do something than to do that something for them.\"" AND "Message:\"Money Doesn't Grow On Trees \<Meaning\>: Suggests that money is a resource that must be earned and is not one that's easily acquired.\"" AND "Message:\"There's No I in Team \<Meaning\>: To not work alone, but rather, together with others in order to achieve a certain goal.\"" AND "Message:\"A Busy Bee \<Meaning\>: An industrious person.\"" AND "Message:\"Wake Up Call \<Meaning\>: An occurance of sorts that brings a problem to somebody's attention and they realize it needs fixing.\"" AND "Message:\"A Lot on One\'s Plate \<Meaning\>: A lot \(or too much\) to do or cope with.\"" AND "Message:\"Under the Weather \<Meaning\>: Not feeling well, in health or mood.\"" OR "Message:\"A Day Late and a Dollar Short \<Meaning\>: Too late. A missed opportunity.\"" OR "Message:\"Back to Square One \<Meaning\>: To go back to the beginning; back to the drawing board.\"" OR "Message:\"An Arm and a Leg \<Meaning\>: Something that is extremely expensive.\"" AND "Message:\"Jaws of Death \<Meaning\>: Being in a dangerous or very deadly situation.\"" OR "Message:\"Barking Up The Wrong Tree \<Meaning\>: To make a wrong assumption about something.\"" OR "Message:\"Swinging For the Fences \<Meaning\>: Giving something your all.\"" OR "Message:\"Talk the Talk \<Meaning\>: Supporting what you say, not just with words, but also through action or evidence.\"" OR "Message:\"Back To the Drawing Board \<Meaning\>: Starting over again on a new design from a previously failed attempt.\"" OR "Message:\"On the Ropes \<Meaning\>: Being in a situation that looks to be hopeless!\"" OR "Message:\"Tug of War \<Meaning\>: It can refer to the popular rope pulling game or it can mean a struggle for authority.\"" AND "Message:\"A Dime a Dozen \<Meaning\>: Something that is extremely common.\"" AND "Message:\"In a Pickle \<Meaning\>: Being in a difficult predicament; a mess; an undesirable situation.\"" AND "Message:\"Ring Any Bells? \<Meaning\>: Recalling a memory; causing a person to remember something or someone.\"" AND "Message:\"When the Rubber Hits the Road \<Meaning\>: When something is about to begin, get serious, or put to the test.\"" AND "Message:\"Burst Your Bubble \<Meaning\>: To ruin someone's happy moment.\"" AND "Message:\"No Ifs, Ands, or Buts \<Meaning\>: Finishing a task without making any excuses.\"" AND "Message:\"Tough It Out \<Meaning\>: To remain resillient even in hard times; enduring.\"" OR "Message:\"Curiosity Killed The Cat \<Meaning\>: Typically said to indicate that any further investigation into a situation may lead to harm.\"" OR "Message:\"A Chip on Your Shoulder \<Meaning\>: Being angry about something that happened in the past.\"" OR "Message:\"A Cold Day in July \<Meaning\>: Something that is highly unlikely to happen.\"" OR "Message:\"Cry Over Spilt Milk \<Meaning\>: It's useless to worry about things that  already happened and cannot be changed.\"" OR "Message:\"A Leg Up \<Meaning\>: Someone who's given an advantage over others.\"" OR "Message:\"It's Not Brain Surgery \<Meaning\>: A task that's easy to accomplish, a thing lacking complexity.\"" OR "Message:\"You Can't Judge a Book By Its Cover \<Meaning\>: Don't judge someone or something only by the outward appearance.\"" AND "Message:\"Down For The Count \<Meaning\>: Someone or something that looks to be defeated, or nearly so.\"" OR "Message:\"Yada Yada \<Meaning\>: A way to notify a person that what they're saying is predictable or boring.\"" AND "Message:\"Let Her Rip \<Meaning\>: Permission to start, or it could mean 'go faster!'\"" OR "Message:\"Wouldn't Harm a Fly \<Meaning\>: Nonviolent; someone who is mild or gentle.\"" OR "Message:\"Off One's Base \<Meaning\>: A person that is crazy or behaving in idiotic ways\"" AND "Message:\"Close But No Cigar \<Meaning\>: Coming close to a successful outcome only to fall short at the end.\"" AND "Message:\"It's Not All It's Cracked Up To Be \<Meaning\>: Failing to meet expectations; not being as good as people say.\"" AND "Message:\"What Am I, Chopped Liver? \<Meaning\>: A rhetorical question used by a person who feels they are being given less consideration than someone else.\"" AND "Message:\"A Dog in the Manger \<Meaning\>: Someone who prevents others from using valuable items even though they have no need for them.\"" AND "Message:\"A Bite at the Cherry \<Meaning\>: An opportunity that's not available to most people.\"" OR "Message:\"Don't Count Your Chickens Before They Hatch \<Meaning\>: Do not rely on something you are not sure of.\"'''
import re
selection = [ item for item in re.split(' OR ', message)[1:-1] 
                  if ' AND ' in item ]
print(*selection, sep='\n')

giving

"Message:"Birds of a Feather Flock Together \<Meaning\>: People tend to associate with others who share similar interests or values."" AND "Message:"Up In Arms \<Meaning\>: Angry; being roused to the point that you are ready to fight.""
"Message:"Know the Ropes \<Meaning\>: Having a familiarity or understanding of how something works."" AND "Message:"Poke Fun At \<Meaning\>: Making fun of something or someone; ridicule."" AND "Message:"Give a Man a Fish \<Meaning\>: It's better to teach a person how to do something than to do that something for them."" AND "Message:"Money Doesn't Grow On Trees \<Meaning\>: Suggests that money is a resource that must be earned and is not one that's easily acquired."" AND "Message:"There's No I in Team \<Meaning\>: To not work alone, but rather, together with others in order to achieve a certain goal."" AND "Message:"A Busy Bee \<Meaning\>: An industrious person."" AND "Message:"Wake Up Call \<Meaning\>: An occurance of sorts that brings a problem to somebody's attention and they realize it needs fixing."" AND "Message:"A Lot on One's Plate \<Meaning\>: A lot \(or too much\) to do or cope with."" AND "Message:"Under the Weather \<Meaning\>: Not feeling well, in health or mood.""
"Message:"An Arm and a Leg \<Meaning\>: Something that is extremely expensive."" AND "Message:"Jaws of Death \<Meaning\>: Being in a dangerous or very deadly situation.""
"Message:"Tug of War \<Meaning\>: It can refer to the popular rope pulling game or it can mean a struggle for authority."" AND "Message:"A Dime a Dozen \<Meaning\>: Something that is extremely common."" AND "Message:"In a Pickle \<Meaning\>: Being in a difficult predicament; a mess; an undesirable situation."" AND "Message:"Ring Any Bells? \<Meaning\>: Recalling a memory; causing a person to remember something or someone."" AND "Message:"When the Rubber Hits the Road \<Meaning\>: When something is about to begin, get serious, or put to the test."" AND "Message:"Burst Your Bubble \<Meaning\>: To ruin someone's happy moment."" AND "Message:"No Ifs, Ands, or Buts \<Meaning\>: Finishing a task without making any excuses."" AND "Message:"Tough It Out \<Meaning\>: To remain resillient even in hard times; enduring.""
"Message:"You Can't Judge a Book By Its Cover \<Meaning\>: Don't judge someone or something only by the outward appearance."" AND "Message:"Down For The Count \<Meaning\>: Someone or something that looks to be defeated, or nearly so.""
"Message:"Yada Yada \<Meaning\>: A way to notify a person that what they're saying is predictable or boring."" AND "Message:"Let Her Rip \<Meaning\>: Permission to start, or it could mean 'go faster!'""
"Message:"Off One's Base \<Meaning\>: A person that is crazy or behaving in idiotic ways"" AND "Message:"Close But No Cigar \<Meaning\>: Coming close to a successful outcome only to fall short at the end."" AND "Message:"It's Not All It's Cracked Up To Be \<Meaning\>: Failing to meet expectations; not being as good as people say."" AND "Message:"What Am I, Chopped Liver? \<Meaning\>: A rhetorical question used by a person who feels they are being given less consideration than someone else."" AND "Message:"A Dog in the Manger \<Meaning\>: Someone who prevents others from using valuable items even though they have no need for them."" AND "Message:"A Bite at the Cherry \<Meaning\>: An opportunity that's not available to most people.""

Let's compare the approach using splitting with the approach using searching:

import re
from time import perf_counter as T
sT_1=T()
selection_1 = [item for item in re.split(' OR ', message)[1:-1] if ' AND ' in item]
eT_1=T()
sT_2=T()
selection_2 = re.findall('(?<= OR )((?:(?! OR ).)  AND (?:(?! OR ).) )(?= OR )',message)
eT_2=T()
assert selection_1 == selection_2
print(f'{(eT_2-sT_2) - (eT_1-sT_1):8.6f}, {(eT_1-sT_1):8.6f}, {(eT_2-sT_2):8.6f}')  

which prints:

0.000300, 0.000150, 0.000450

showing that the approach using splitting runs 3 times faster than the approach using searching.

  • Related