Home > Net >  Scraping content from a dynamic webpage
Scraping content from a dynamic webpage

Time:12-24

I have a webpage that updates its content whenever the user reaches the bottom.

I was looking for answers here on how to get the content of the page and I found this, which suggested to check the developer tools in the browser to get the details about the request that was made to show the content to be able to reproduce it with python.

I've tried with this piece of code:

import requests

url = "https://www.pmindia.gov.in/en/tag/pmspeech/"

# get the content of the page when it is loaded for the first time
content = requests.get(url)

# replicate the POST request to show more content
payload = {'action': 'infinite_scroll_speeches', 'page_no': '2', 'tag': 'pmspeech', 'loop_file': '10', 'language': 'en'}
php_url = "https://www.pmindia.gov.in/wp-admin/admin-ajax.php"
new_content = requests.post(php_url, json=payload, verify=False)

but I keep getting error 503 or 404 as a response.

I would like to get a table having, for each speech, the title, the link and the date. I know I can extract this information afterwards using BeautifulSoup or similar packages, but I'm stacked here.

CodePudding user response:

you are sending request to wrong URL, Please check the screen shot attached and copy its curl(bash) command and convert it to python request then you will be able to get the data.

enter image description here

  • Related