In beautifulsoup, we use response.content to render the text of the URL and create new file. What should we write if we use HTMLSession from requests_html instead of beautifulsoup?
For example,
import requests
from urllib.parse import urlparse
from requests_html import HTMLSession
session = HTMLSession()
# Specify the DOI here
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf"
r = session.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
print(f"Begin writing to {pdf_title}")
new_pdf.write(r.html.content) # This line is not working
CodePudding user response:
This is all you need, although when I do this, I get "request forbidden by administrative rules". Presumably, you have the key to get past this.
import requests
pdf_title = "xyz.pdf"
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf"
r = requests.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
new_pdf.write(r.content)