How to get text data from a single tag without comma seperator using scrapy-CodePudding

below is the html snippet

<P >
 <b>
  <a name="Editores"> Editorial </a>
    "assistant"
 </b>
</p>

by using this scrapy code

response.css("p.subtitulo *::text").extract()

I get

['Editorial', ' Assistant']

response.css("p.subtitulo *::text").get()

I get only "

Assistant

" I want the full string without any commas like

"Editorial Assistant"

Using Beautiful soup I am getting the text without comma. But how to do it with Scrapy. Since I have other roles separated by commas I don't want to use split().

This is the page url http://www.scielo.org.co/revistas/zop/iedboard.htm

CodePudding user response：

You can do that by invoking .join() and .getall() method as follows:

import scrapy
class TestSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['http://www.scielo.org.co/revistas/zop/iedboard.htm']
        
    def parse(self, response):
        for p in response.css('.subtitulo')[1:]:
            yield {
            'Name': ''.join(p.css("::text").getall())
            }

Output:

{'Name': 'Editorial Assistant'}
2022-08-08 15:39:03 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.scielo.org.co/revistas/zop/iedboard.htm>
{'Name': 'Editorial Committee '}
2022-08-08 15:39:03 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.scielo.org.co/revistas/zop/iedboard.htm>
{'Name': 'Scientific Committee'}
2022-08-08 15:39:03 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.scielo.org.co/revistas/zop/iedboard.htm>
{'Name': 'Editorial Universidad Del Norte'}