Home > Mobile >  Scraping asb.net site does not work when using a function in selenium in python
Scraping asb.net site does not work when using a function in selenium in python

Time:09-28

I want to scrape a .net website, i make this code

from scrapy import Selector
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager


class BoursakuwaitSpider(scrapy.Spider):
    name = 'boursakuwait'
    custom_settings = {
        'FEED_URI': 'second.json',
        'FEED_FORMAT': 'json',
    }
    start_urls = ['https://casierjudiciaire.justice.gov.ma/verification.aspx']

    def parse(self, no_response):
        browser = webdriver.Chrome(executable_path=ChromeDriverManager().install())
        browser.get('https://casierjudiciaire.justice.gov.ma/verification.aspx')
        time.sleep(10)
        response = Selector(text=browser.page_source)

when i use the function parse the code does not work but if i use just the class like this :

import time

import scrapy
from scrapy import Selector
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager


class BoursakuwaitSpider(scrapy.Spider):
    name = 'boursakuwait'
    custom_settings = {
        'FEED_URI': 'second.json',
        'FEED_FORMAT': 'json',
    }
    start_urls = ['https://casierjudiciaire.justice.gov.ma/verification.aspx']


    browser = webdriver.Chrome(executable_path=ChromeDriverManager().install())
    browser.get('https://casierjudiciaire.justice.gov.ma/verification.aspx')
    time.sleep(10)
    response = Selector(text=browser.page_source)

The code work correclty. But for me i want to use the function (the first code) i don't know where is the problem. please any help.

CodePudding user response:

  1. At first, create def start_requests(self): method. Then set up all the selenium dependencies under this method.

  2. You have to transfer the browser/ driver from one def method to another injecting self keyword. The following code is working:

Example:

import time
import scrapy
from scrapy import Selector
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from scrapy.crawler import CrawlerProcess
from selenium.webdriver.chrome.options import Options

class BoursakuwaitSpider(scrapy.Spider):
    name = 'boursakuwait'
    # custom_settings = {
    #     'FEED_URI': 'second.json',
    #     'FEED_FORMAT': 'json',
    # }

    def start_requests(self):

        options = webdriver.ChromeOptions()
        options.add_argument("start-maximized")
        options.add_experimental_option("detach", True)
        url = 'https://stackoverflow.com'
        self.browser = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)
        self.browser.get(url)
        time.sleep(5)

        yield scrapy.Request(
            url='https://stackoverflow.com',
            callback=self.parse
        )

    def parse(self,response):
        self.browser.get(response.url)
        time.sleep(5)
        #response = Selector(text=self.browser.page_source)


if __name__ == "__main__":
    process =CrawlerProcess()
    process.crawl(BoursakuwaitSpider)
    process.start()
  • Related