Home > Back-end >  Start_urls not getting parsed
Start_urls not getting parsed

Time:09-27

The code below is pretty close to what you would see in most tutorials when working with Scrapy's FormRequest, but for some reason, no matter what variation I try, I can't seem to get it to work. It's my understanding (and maybe I'm dead wrong) that start_url is supposed to basically hand off to the parse function, which starts the process of scraping a site. Whenever I run this script, it just sets start_url to be the URL and then treats parse like an uncalled function (skipping it). I'm not sure what I'm doing wrong, but it's driving me nuts!!

import requests
import scrapy
from scrapy import Spider
from scrapy.http import FormRequest

def authentication_failed(response):
# TODO: Check the contents of the response and return True if it failed
# or False if it succeeded.
pass 
class LoginSpider(scrapy.Spider):
   name = 'example.com'
   start_urls = ["https://app.hubspot.com/login"]

   def parse(self, response):
      f=open("/PATH/auth.txt","r")
      lines=f.readlines()
      username=lines[0]
      password=lines[1]
      f.close() 
      yield scrapy.FormRequest.from_response(
        response,
        formdata={'email': username, 'password': password},
        callback=self.after_login(self,response)
    )

   def after_login(self, response):
      if authentication_failed(response):
          self.logger.error("Login failed")
          return

CodePudding user response:

It is getting passed to parse function, and this is the page: enter image description here

It's using Javascript to check your browser, try using a headless browser.

  • Related