Home > Software engineering >  cannot import name 'PageCoroutine' from 'scrapy_playwright.page'
cannot import name 'PageCoroutine' from 'scrapy_playwright.page'

Time:11-06

I am trying to use scrapy and playwright to scrape dynamic webpages, I installed scrapy and playwright, however, when I try to run my spider, i get this error.

ImportError: cannot import name 'PageCoroutine' from 'scrapy_playwright.page' (C:\Ali\DataCamp\Web Scraping in Python\Scrapy\venv\lib\site-packages\scrapy_playwright\page.py)

This is my code(it's a test code):

import scrapy
from scrapy_playwright.page import PageCoroutine

class PwspiderSpider(scrapy.Spider):
    name = 'pwspider'
    
    def start_requests(self):
        yield scrapy.Request("https://shoppable-campaign-demo.netlify.app/#/", meta=dict(playwright=True, playwright_include_page=True, playwright_page_coroutine=[PageCoroutine('wait_for_selector', 'div#productListing')]))

    async def parse(self, response):
        yield {'text': response.text}

I even added the DOWNLOAD_HANDLERS and the TWISTED_REACTOR in the settings file.

CodePudding user response:

PageCoroutine is deprecated/obsolute. Use playwright_page_methods instead.

Working code as an example:

import scrapy
from scrapy_playwright.page import PageMethod

class TestSpider(scrapy.Spider):
    name = "test"
    def start_requests(self):
        yield scrapy.Request(

            url="https://shoppable-campaign-demo.netlify.app/#/",
            callback=self.parse,
            meta={
                "playwright": True,
                "playwright_page_methods": [
                    PageMethod("wait_for_selector", '.card-body'),
                ],
            },
        )

    def parse(self, response):
        
        products = response.xpath('//*[@]')
        for product in products:
            yield {
            'title':product.xpath('.//*[@]/text()').get()
          
            }

Output:

{'title': 'Oxford Loafers'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'Ankle-length Slack'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'White Baseball Cap'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'Triangle Bikini Top'}
2022-11-05 20:40:40 [scrapy.core.scraper] DEBUG: Scraped from <200 https://shoppable-campaign-demo.netlify.app/#/>
{'title': 'Short Blazer'}
2022-11-05 20:40:40 [scrapy.core.engine] INFO: Closing spider (finished)
2022-11-05 20:40:40 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 235,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 39851,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 41.370211,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 11, 5, 14, 40, 40, 261151),
 'item_scraped_count': 5,
  • Related