I'm using an Indeed API from Rapid API to collect job data. The code snippet provided only returns results for 1 page. I was wondering how to set up a for loop to iterate through multiple pages and append the results together.
url = "https://indeed11.p.rapidapi.com/"
payload = {
"search_terms": "data visualization",
"location": "New York City, NY",
"page": 1,
"fetch_full_text": "yes"
}
headers = {
"content-type": "application/json",
"X-RapidAPI-Key": "{api key here}", # insert here,
"X-RapidAPI-Host": "indeed11.p.rapidapi.com"
}
response = requests.request("POST", url, json=payload, headers=headers)
As seen in the code above, the key "page" is set to a value of 1. How would I parameterize this value, and how would I construct the for loop while appending the results from each page? Thank you so much.
CodePudding user response:
You can make the pagination with the help of payload along with for loop and range function
import requests
url = "https://indeed11.p.rapidapi.com/"
payload = {
"search_terms": "data visualization",
"location": "New York City, NY",
"page": 1,
"fetch_full_text": "yes"
}
headers = {
"content-type": "application/json",
"X-RapidAPI-Key": "{api key here}", # insert here,
"X-RapidAPI-Host": "indeed11.p.rapidapi.com"
}
for page in range(1,11):
payload['page'] = page
response = requests.post(url, json=payload, headers=headers)
CodePudding user response:
You can try this:
max_page = 100
result = {}
for i in range(1, max_page 1):
try:
payload.update({'page': i})
if i not in result:
result[i] = requests.request("POST", url, json=payload, headers=headers)
except:
continue
CodePudding user response:
I think that you could do this with a while loop. To implement this, you would need code to detect when there are no more pages to read, but it's probably possible. Here's what I would do:
url = "https://indeed11.p.rapidapi.com/"
payload = {
"search_terms": "data visualization",
"location": "New York City, NY",
"page": 1,
"fetch_full_text": "yes"
}
headers = {
"content-type": "application/json",
"X-RapidAPI-Key": "{api key here}", # insert here,
"X-RapidAPI-Host": "indeed11.p.rapidapi.com"
}
responses = []
while not no_more_pages(): # no_more_pages() is a placeholder for code that detects when there are no more pages to read
responses.append(requests.request("POST", url, json=payload, headers=headers))
payload['page'] = 1
Once the loop is done, you could use the responses
list to access the data.