Home > Enterprise >  How can I get the content of an url and write into new file using HTMLSession in Python?
How can I get the content of an url and write into new file using HTMLSession in Python?

Time:05-05

In beautifulsoup, we use response.content to render the text of the URL and create new file. What should we write if we use HTMLSession from requests_html instead of beautifulsoup?

For example,

import requests
from urllib.parse import urlparse
from requests_html import HTMLSession

session = HTMLSession()

# Specify the DOI here
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf" 
r = session.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
    print(f"Begin writing to {pdf_title}")
    new_pdf.write(r.html.content) # This line is not working

CodePudding user response:

This is all you need, although when I do this, I get "request forbidden by administrative rules". Presumably, you have the key to get past this.

import requests

pdf_title = "xyz.pdf"
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf" 
r = requests.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
    new_pdf.write(r.content) 
  • Related