Home > Enterprise >  Match any of two phrases but not a third one
Match any of two phrases but not a third one

Time:11-24

I have a list of links (that also have some svg icons in the a - it makes my pattern more complex, that's why I mention this), and I want to grab two particular ones.

So if this is the subject to search within:

            <h2>title</h2>      
        Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.      
        
            <a href="#" role="button">
            <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M288 32c0-17.7-14.3-32-32-32s-32 14.3-32 32V274.7l-73.4-73.4c-12.5-12.5-32.8-12.5-45.3 0s-12.5 32.8 0 45.3l128 128c12.5 12.5 32.8 12.5 45.3 0l128-128c12.5-12.5 12.5-32.8 0-45.3s-32.8-12.5-45.3 0L288 274.7V32zM64 352c-35.3 0-64 28.7-64 64v32c0 35.3 28.7 64 64 64H448c35.3 0 64-28.7 64-64V416c0-35.3-28.7-64-64-64H346.5l-45.3 45.3c-25 25-65.5 25-90.5 0L165.5 352H64zM432 456c-13.3 0-24-10.7-24-24s10.7-24 24-24s24 10.7 24 24s-10.7 24-24 24z"/></svg>      
                        Download the warranty
                    </a>
   
<a href="#" role="button">
                <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M288 32c0-17.7-14.3-32-32-32s-32 14.3-32 32V274.7l-73.4-73.4c-12.5-12.5-32.8-12.5-45.3 0s-12.5 32.8 0 45.3l128 128c12.5 12.5 32.8 12.5 45.3 0l128-128c12.5-12.5 12.5-32.8 0-45.3s-32.8-12.5-45.3 0L288 274.7V32zM64 352c-35.3 0-64 28.7-64 64v32c0 35.3 28.7 64 64 64H448c35.3 0 64-28.7 64-64V416c0-35.3-28.7-64-64-64H346.5l-45.3 45.3c-25 25-65.5 25-90.5 0L165.5 352H64zM432 456c-13.3 0-24-10.7-24-24s10.7-24 24-24s24 10.7 24 24s-10.7 24-24 24z"/></svg>      
                        Tech Specs
                    </a>



<a href="#" role="button">
        <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M288 32c0-17.7-14.3-32-32-32s-32 14.3-32 32V274.7l-73.4-73.4c-12.5-12.5-32.8-12.5-45.3 0s-12.5 32.8 0 45.3l128 128c12.5 12.5 32.8 12.5 45.3 0l128-128c12.5-12.5 12.5-32.8 0-45.3s-32.8-12.5-45.3 0L288 274.7V32zM64 352c-35.3 0-64 28.7-64 64v32c0 35.3 28.7 64 64 64H448c35.3 0 64-28.7 64-64V416c0-35.3-28.7-64-64-64H346.5l-45.3 45.3c-25 25-65.5 25-90.5 0L165.5 352H64zM432 456c-13.3 0-24-10.7-24-24s10.7-24 24-24s24 10.7 24 24s-10.7 24-24 24z"/></svg>      
                        Download
                    </a>

, I only want to grab the Tech Specs and the Download links. Nothing more, nothing less. For this reason I wrote this regex /<a href="(.*)">[\s\S]*(Download|Tech Specs)[\s\S]*<\/a>/mgUu which however unfortunately catches the Download the warranty link also. How can I change my pattern in order to exclude that? I know it has to do with some negative look-arounds, but I can't figure it out... Ah, in the $matches array, I need the matched text to be in a capturing group also besides the link, so that I know which is link is which... TIA.

https://regex101.com/r/cvXzkS/1

CodePudding user response:

See this demo: https://regex101.com/r/wztpJQ/1

Which uses this regex (?<=<a href=")(?P<link>[^"]*)(?=" .*>\n.*\n\t*(?P<name>.*Specs|.*Download)\n.*<\/a>)

It only matches the href value if the a tag ends with specific text before it, note how it matches based on last word before </a> tag

Note 2 the demo has group names

  • Related