I am new to web scraping and I'm trying to use Scrapy to scrape the Release Date for the following website: https://m.imdb.com/title/tt0468569/?ref_=adv_li_tt
This is the selector I am using:
//a[contains(@class,'ipc-metadata-list-item__list-content-item ipc-metadata-list-item__list-content-item--link')]/text()
It returns too many elements and I just want the release data string.
CodePudding user response:
To select more specific and get only the text of release date adjust your path like this:
//li[contains(@data-testid,'title-details-releasedate')]//li/a/text()
It will select the <li>
that contains the attribute data-testid
with value title-details-releasedate
. Cause these contains two <a>
it focus on the <a>
that is contained in another <li>