I'm working on a project using Scrapy and I have such content of html file. I would like to extract the title values, e.g. "ELK set up for creating a SIEM Solution_Upwork Request".
<a href="https://discuss.elastic.co/t/elk-set-up-for-creating-a-siem-solution-upwork-request/286299" class="title raw-link raw-topic-link">ELK set up for creating a SIEM Solution_Upwork Request</a>
I am receiving all the titles on the webpage using:
result = response.xpath('''//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]''').extract()
Printing the result:
[<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
...
I have tried
result.xpath("""//[@id="raw-topic-link"]/text()""").extract()
but I am getting an empty list or invalid expression error. Any idea how to solve this? Are there any useful online resources to learn more about all the different variations how to extract values from divs, classes, links and more?
CodePudding user response:
You can try as follows:
response.xpath('//a[@class="title raw-link raw-topic-link"]/text()')#.get() or.getall()