Home > Net >  Unable to find a way to paginate through API data
Unable to find a way to paginate through API data

Time:07-02

I'm trying to use Python 3 requests.getto retrieve data from this page using its API. I'm interested in retrieving it using the data found here and saving the entire table into my own JSON.

Here's my attempt so far

 source = requests.get("https://www.mwebexplorer.com/api/mwebblocks").json()
 with open('mweb.json', 'w') as json_file:
    json.dump(source, json_file)

I've looked through other questions in regards to pagination and all the other problems are able to write for loops to iterate through all the pages, but in my specific case, the link does not change when clicking next to go to the next page of data. I also can't use scrapy's xpath method to click next due to the entire table and its pagination not being accessible through HTML or XML.

Is there something I can add to my requests.get to include the entire JSON of all pages of the table?

CodePudding user response:

Depending on what browser you're using it might be different, but in chrome I can go to the network tab in devtools and view the full details of the request. This reveals that it's actually a POST request, not a GET request. If you look at the payload, you can see a bunch of key-value pairs, including a start and a length.

So, try something like

requests.post("https://www.mwebexplorer.com/api/mwebblocks", data={"start": "50", "length": "50"})

or similar. You might need to include the other parts of the form data, depending on the response you get.

Keep in mind that sites frequently don't like it when you try to scrape them like this.

  • Related