Home > Software design >  Strugling to scrape titles with scrapy
Strugling to scrape titles with scrapy

Time:11-01

I am currently trying to scrape a title via Scrapy's shell. https://cineb.art/movies is the website, the title is located in :

   <h3 >

When i type response.css("film-name").get() i get nothing in return

fetch('https://cineb.art/movies') 2022-11-01 12:17:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cineb.art/movies> (referer: None)>>> response.css('film-title').get() response.css('film-name').get()

this is my log. Help me out! Thanks in advance!

CodePudding user response:

there is an "a" tag under "h3" tag and you need to get text

response.css("h3.film-name a::text").get()

CodePudding user response:

An example is given below applying xpath locator strategy

import scrapy

class TestSpider(scrapy.Spider):
    name = 'test'  
    def start_requests(self):
         yield scrapy.Request ('https://cineb.art/movies',
         callback = self.parse
         )

    def parse(self, response):
        for h3 in response.xpath('//*[@]/a'):
            yield {'title':h3.xpath('./text()').get()}

Output:

{'title': 'Wickensburg'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Lair'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Deal'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Two Witches'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Merry Swissmas'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Sissy'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Bring It On: Cheer or Die'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'An Amish Sin'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Jolly Good Christmas'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'A Savannah Haunting'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Wild Is the Wind'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'A Chance Encounter'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Ghosts of Flight 401'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'My Nightmare Office Affair'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Love Box in Your Living Room'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Beyond the Universe'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Robbing Mussolini'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'A Tree of Life: The Pittsburgh Synagogue Shooting'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Operation Seawolf'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Blade of the 47 Ronin'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Fortune Feimster: Good Fortune'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Kings of Coke'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Shady Grove'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'House of Clowns'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Freeway Killer: Lost Murder Tapes'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'We Need a Little Christmas'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'We Wish You a Married Christmas'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Swindler Seduction'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Sinphony: A Clubhouse Horror Anthology'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'American Murderer'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Dangerous Game: The Legacy Murders'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Wendell & Wild'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Detective Knight: Rogue'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Slayers'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Domestic'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Pez Outlaw'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Where Are You'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Descendant'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Terror Train'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Noel Next Door'}
2022-11-01 15:35:30 [scrapy.core.engine] INFO: Closing spider (finished)
2022-11-01 15:35:30 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 292,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 14043,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 1.566455,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 11, 1, 9, 35, 30, 341770),
 'httpcompression/response_bytes': 142888,
 'httpcompression/response_count': 1,
 'item_scraped_count': 40,
  • Related