Home > Enterprise >  Trying to Scrape from website with login Python
Trying to Scrape from website with login Python

Time:11-28

I am trying to scrape my data from a website that requires a login but I keep getting the following error:

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>MethodNotAllowed</Code><Message>The specified method is not allowed against this resource.</Message><Method>POST</Method><ResourceType>OBJECT</ResourceType><RequestId>DCVJZ8D4R3PK45M1</RequestId><HostId>PIra5vNbfC5d1TfFZ3hABXk9eIsKwtJm5bYH4Bozu4nS4InkGEILNflPPzdvT9hUpQOPaW0AZBA=</HostId></Error>

Python Script

import requests


loginurl = ("https://cbscarrickonsuir.app.vsware.ie/")
secure_url = ("https://cbscarrickonsuir.app.vsware.ie/11571471/behaviour")
payload = {"username":"REMOVED","password":"REMOVED","source":"web"}
r = requests.post(loginurl, data=payload)
print(r.text)

Had to remove username and password as this is a working website. I don't know how to do this. I followed a Request Headers

Example request body

It would be a good idea to click through links in XHR part of Network tab and see the headers, request and response to understand what API endpoint exactly you should be using along with the method, the request body format which is expected and the kind of response you will receive.

Edit: Also you'll be probably needing persistent sessions for scraping any data which will require you to login first. Go through these:

  1. Python Requests and persistent sessions
  2. https://requests.kennethreitz.org/en/master/user/advanced/#session-objects

CodePudding user response:

There are two mistakes in your code.

  1. you send data to main page but browser send to https://cbscarrickonsuir.vsware.ie/tokenapiV2/login

  2. you send data as FORM data but browser sends as JSON data so you need json=payload instead of data=payload

Other problem can make that you don't use Session() to send automatically cookies - and all servers use cookies to keep information that you already logged in. If you don't send cookies then server doesn't know that you are logged in.

import requests

url = "https://cbscarrickonsuir.app.vsware.ie/"

login_url = 'https://cbscarrickonsuir.vsware.ie/tokenapiV2/login'

payload = {
    "username": "none",
    "password": "[email protected]",
    "source":"web"
}

s = requests.Session()

r = s.post(login_url, json=payload)

print('status:', r.status_code)
print('--- text ---')
print(r.text)
print('----------------')

I don't have account to login but now it get status 401 with message invalid_username_password

status: 401
--- text ---
{"fieldErrors":[],"genericErrors":[{"messageKey":"invalid_username_password","metadata":null}]}
  • Related