I am trying to scrape my data from a website that requires a login but I keep getting the following error:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>MethodNotAllowed</Code><Message>The specified method is not allowed against this resource.</Message><Method>POST</Method><ResourceType>OBJECT</ResourceType><RequestId>DCVJZ8D4R3PK45M1</RequestId><HostId>PIra5vNbfC5d1TfFZ3hABXk9eIsKwtJm5bYH4Bozu4nS4InkGEILNflPPzdvT9hUpQOPaW0AZBA=</HostId></Error>
Python Script
import requests
loginurl = ("https://cbscarrickonsuir.app.vsware.ie/")
secure_url = ("https://cbscarrickonsuir.app.vsware.ie/11571471/behaviour")
payload = {"username":"REMOVED","password":"REMOVED","source":"web"}
r = requests.post(loginurl, data=payload)
print(r.text)
Had to remove username and password as this is a working website. I don't know how to do this. I followed a
It would be a good idea to click through links in XHR
part of Network tab and see the headers, request and response to understand what API endpoint exactly you should be using along with the method, the request body format which is expected and the kind of response you will receive.
Edit: Also you'll be probably needing persistent sessions for scraping any data which will require you to login first. Go through these:
- Python Requests and persistent sessions
- https://requests.kennethreitz.org/en/master/user/advanced/#session-objects
CodePudding user response:
There are two mistakes in your code.
you send data to main page but browser send to
https://cbscarrickonsuir.vsware.ie/tokenapiV2/login
you send data as
FORM data
but browser sends asJSON data
so you needjson=payload
instead ofdata=payload
Other problem can make that you don't use Session()
to send automatically cookies
- and all servers use cookies
to keep information that you already logged in. If you don't send cookies
then server doesn't know that you are logged in.
import requests
url = "https://cbscarrickonsuir.app.vsware.ie/"
login_url = 'https://cbscarrickonsuir.vsware.ie/tokenapiV2/login'
payload = {
"username": "none",
"password": "[email protected]",
"source":"web"
}
s = requests.Session()
r = s.post(login_url, json=payload)
print('status:', r.status_code)
print('--- text ---')
print(r.text)
print('----------------')
I don't have account to login but now it get status 401
with message invalid_username_password
status: 401
--- text ---
{"fieldErrors":[],"genericErrors":[{"messageKey":"invalid_username_password","metadata":null}]}