I am currently trying to scrape a title via Scrapy's shell. https://cineb.art/movies is the website, the title is located in :
<h3 >
When i type response.css("film-name").get() i get nothing in return
fetch('https://cineb.art/movies') 2022-11-01 12:17:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://cineb.art/movies> (referer: None)>>> response.css('film-title').get() response.css('film-name').get()
this is my log. Help me out! Thanks in advance!
CodePudding user response:
there is an "a" tag under "h3" tag and you need to get text
response.css("h3.film-name a::text").get()
CodePudding user response:
An example is given below applying xpath locator strategy
import scrapy
class TestSpider(scrapy.Spider):
name = 'test'
def start_requests(self):
yield scrapy.Request ('https://cineb.art/movies',
callback = self.parse
)
def parse(self, response):
for h3 in response.xpath('//*[@]/a'):
yield {'title':h3.xpath('./text()').get()}
Output:
{'title': 'Wickensburg'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Lair'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Deal'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Two Witches'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Merry Swissmas'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Sissy'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Bring It On: Cheer or Die'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'An Amish Sin'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Jolly Good Christmas'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'A Savannah Haunting'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Wild Is the Wind'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'A Chance Encounter'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Ghosts of Flight 401'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'My Nightmare Office Affair'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Love Box in Your Living Room'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Beyond the Universe'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Robbing Mussolini'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'A Tree of Life: The Pittsburgh Synagogue Shooting'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Operation Seawolf'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Blade of the 47 Ronin'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Fortune Feimster: Good Fortune'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Kings of Coke'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Shady Grove'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'House of Clowns'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Freeway Killer: Lost Murder Tapes'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'We Need a Little Christmas'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'We Wish You a Married Christmas'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Swindler Seduction'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Sinphony: A Clubhouse Horror Anthology'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'American Murderer'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Dangerous Game: The Legacy Murders'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Wendell & Wild'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Detective Knight: Rogue'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Slayers'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Domestic'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'The Pez Outlaw'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Where Are You'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Descendant'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Terror Train'}
2022-11-01 15:35:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cineb.art/movies>
{'title': 'Noel Next Door'}
2022-11-01 15:35:30 [scrapy.core.engine] INFO: Closing spider (finished)
2022-11-01 15:35:30 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 292,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 14043,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 1.566455,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 11, 1, 9, 35, 30, 341770),
'httpcompression/response_bytes': 142888,
'httpcompression/response_count': 1,
'item_scraped_count': 40,