The code below is pretty close to what you would see in most tutorials when working with Scrapy's
FormRequest
, but for some reason, no matter what variation I try, I can't seem to get it to work. It's my understanding (and maybe I'm dead wrong) that start_url
is supposed to basically hand off to the parse
function, which starts the process of scraping a site. Whenever I run this script, it just sets start_url
to be the URL and then treats parse
like an uncalled function (skipping it). I'm not sure what I'm doing wrong, but it's driving me nuts!!
import requests
import scrapy
from scrapy import Spider
from scrapy.http import FormRequest
def authentication_failed(response):
# TODO: Check the contents of the response and return True if it failed
# or False if it succeeded.
pass
class LoginSpider(scrapy.Spider):
name = 'example.com'
start_urls = ["https://app.hubspot.com/login"]
def parse(self, response):
f=open("/PATH/auth.txt","r")
lines=f.readlines()
username=lines[0]
password=lines[1]
f.close()
yield scrapy.FormRequest.from_response(
response,
formdata={'email': username, 'password': password},
callback=self.after_login(self,response)
)
def after_login(self, response):
if authentication_failed(response):
self.logger.error("Login failed")
return
CodePudding user response:
It is getting passed to parse function, and this is the page:
It's using Javascript to check your browser, try using a headless browser.