I am trying to use scrapy and playwright to scrape dynamic webpages, I installed scrapy and playwright, however, when I try to run my spider, i get this error.
ImportError: cannot import name 'PageCoroutine' from 'scrapy_playwright.page' (C:\Ali\DataCamp\Web Scraping in Python\Scrapy\venv\lib\site-packages\scrapy_playwright\page.py)
This is my code(it's a test code):
import scrapy
from scrapy_playwright.page import PageCoroutine
class PwspiderSpider(scrapy.Spider):
name = 'pwspider'
def start_requests(self):
yield scrapy.Request("https://shoppable-campaign-demo.netlify.app/#/", meta=dict(playwright=True, playwright_include_page=True, playwright_page_coroutine=[PageCoroutine('wait_for_selector', 'div#productListing')]))
async def parse(self, response):
yield {'text': response.text}
I even added the DOWNLOAD_HANDLERS and the TWISTED_REACTOR in the settings file.
CodePudding user response:
PageCoroutine
is deprecated/obsolute. Use playwright_page_methods
instead.
Working code as an example:
import scrapy
from scrapy_playwright.page import PageMethod
class TestSpider(scrapy.Spider):
name = "test"
def start_requests(self):
yield scrapy.Request(
url="https://shoppable-campaign-demo.netlify.app/#/",
callback=self.parse,
meta={
"playwright": True,
"playwright_page_methods": [
PageMethod("wait_for_selector", '.card-body'),
],
},
)
def parse(self, response):
products = response.xpath('//*[@]')
for product in products:
yield {
'title':product.xpath('.//*[@]/text()').get()
}
Output:
{'title': 'Oxford Loafers'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'Ankle-length Slack'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'White Baseball Cap'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'Triangle Bikini Top'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'Short Blazer'}
2022-11-05 20:40:40 [scrapy.core.engine] INFO: Closing spider (finished)
2022-11-05 20:40:40 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 235,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 39851,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 41.370211,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 11, 5, 14, 40, 40, 261151),
'item_scraped_count': 5,