so i'm trying to scrape questions from Quora, from the link https://www.quora.com/search?q=microwave&type=question Since the questions are dynamically loaded at first I used selenium to simulate scroll down but it is really slow so I'm trying differently. When scrolling down Quora sends a POST request to another link with some payload, I went in Dev tools and network to see what payload they were using.
It looks like this :
{"queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":null,"resultType":"question","author":null,"time":"all_times","first":10,"after":"19","tribeId":null},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}}
I ran this :
import requests
url = 'https://www.quora.com/graphql/gql_para_POST?q=SearchResultsListQuery'
data = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.76", "queryName":"SearchResultsListQuery","variables":{"query":"microwave","disableSpellCheck":'null',"resultType":"question","author":'null',"time":"all_times","first":10,"after":"19","tribeId":'null'},"extensions":{"hash":"f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"}}
r = requests.post(url, data = data)
print(r)
And got <Response [400]>
I plugged in my user agent and replaced the null for 'null', i also tried None or '' or even deleting these keys from the dict but nothing gets it to work.
So maybe I got the wrong hash, I looked at the whole website HTML and other requests it sends and receives to find the hash but didn't succeed.
- Is the error 400 coming from 'null' items ?
- Is the hash a common thing used in POST requests and how to possibly get it ? Thanks
CodePudding user response:
First of all, ensure that your payload is properly formatted as JSON, like this:
data = json.dumps({
"queryName": "SearchResultsListQuery",
"variables": {
"query": "microwave",
"disableSpellCheck": None,
"resultType": "question",
"author": None,
"time": "all_times",
"first": 10,
"after": "19",
"tribeId": None
},
"extensions": {
"hash": "f88cad2308823dc82766c0025ca34be70ea3e60d850a756187645d7483ba2c3b"
}
})
Also, to get a successful response from the quora graph API, you must include a cookie in your request headers:
headers = {
'cookie': '...',
...
}
r = requests.post(url, headers=headers, data=data)
You can find the cookie in your browsers dev tools.