Home > other >  Download direct download to specific folder
Download direct download to specific folder

Time:07-02

I have this code where I am using beautifulsoup to get a url for a direct download link to a pdf and save it into a specific directory.

Getting the url for the link works, it is just getting it to download and save in a directory that is causing problems. I have searched through several sources and finally tried urllib.urlretrieve but this is not working.

Could I get assistance, please. Edit: Main thing I am trying to do is to get the code to download the direct link and not having to manually click it.

If the url is needed for helping here it is; https://www.odfl.com/us/en/resources/tariffs/tariff-odfl-100-0.html As I am parsing through an xml, as there are several other urls that require different code.

import sys
import os
import re
from bs4 import BeautifulSoup
from urllib.parse import urlparse
import urllib
import xml.etree.ElementTree as ET

def ODFL100(tariff_id):

    try:

        pdf_path = ParseXML(tariff_id)
        r = requests.get(pdf_path,stream = True)
        download_path = ParseXML(tariff_id,1)

        link_list = []
        pdf_link = ""
        pdf_found = 0

        r = requests.get(pdf_path)
        soup = BeautifulSoup(r.text, "html.parser")

        base = urlparse(pdf_path)
        for i in soup.find_all('a'):
            if pdf_found == 0:
                current_link = i.get('href')
            else:
                break
            if current_link.endswith('pdf'):
                link_list.append(base.scheme "://" base.netloc   current_link)
                for l in link_list:
                    pdf_link = l
                    if "rates-and-tariffs"  in pdf_link:
                        pdf_found = 1
        urllib.urlretrieve(pdf_link, download_path)

    except Exception as error:
        error_message = "A "   str(error)

CodePudding user response:

I don't quite understand what you want to achieve. But here is an example code that will download a pdf file from a specified page.

import requests
from bs4 import BeautifulSoup


url = 'https://www.odfl.com/us/en/resources/tariffs/tariff-odfl-100-0.html'
response = requests.get(url)
link = 'https://www.odfl.com'   BeautifulSoup(response.text, 'lxml').find('a', class_='cmp-form-button').get('href')
with open(requests.utils.unquote((link.split('/')[-1])), 'wb') as f:
    f.write(requests.get(link).content)
  • Related