Home > Net >  Skipping HTML tag within Scrapy
Skipping HTML tag within Scrapy

Time:09-22

I am scraping data using Scrapy (Python3) from a website and I would like to skip an <a> tag withing the source code because there are two and both have the same classes as you can see in the picture below:

enter image description here

I am trying the secect the <a> tag that is highlighted in blue.

I'm using this: response.xpath("//nav[@class='mp-PaginationControls-pagination']/a/@href").get(), but that only let's me select the first <a> tag so it bugs after I'm on page two.

Here is the raw HTML:

<div >
  <nav >
    <a  href="/l/muziek-en-instrumenten/microfoons/">
      <span aria-hidden="true" ></span>
    </a>
    <span >
      <a  href="/l/muziek-en-instrumenten/microfoons/">1</a>
      <span>2</span>
      <a  href="/l/muziek-en-instrumenten/microfoons/p/3/">3</a>
      <span>...</span>
      <span>142</span>
    </span>
    <span >Pagina 2 van 142</span>
    <a  href="/l/muziek-en-instrumenten/microfoons/p/3/">
      <span aria-hidden="true" ></span>
    </a>
  </nav>
</div>

Thanks in advance

CodePudding user response:

As I see from the XML you shared the second a has different href attribute value.
But since you want to get the href value of it I guess you can't build your XPath based on it...
But below the a are span nodes, so you can find the parent a based on it.
As following:

response.xpath("//nav[@class='mp-PaginationControls-pagination']//a[./span[contains(@class,'mp-svg-arrow-right--inverse')]]/@href").get()
  • Related