How can we download multiple CSV files from a URL?-CodePudding

I am testing this code.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


d = webdriver.Chrome('C:\\Utility\\chromedriver.exe')
d.get('https://developers.humana.com/Resource/PCTFilesList?fileType=innetwork')

# stuck here...
#links = 
for link in links:
    d.get(link)
    # click page 2, 3, 4, etc., up to 100
    for page in range(1, 100)
    page.click
d.quit()

So, I am trying to download CSV files on page 1, then click page 2 and download those files, and then click page 3 and again download those files. The sample code that I shared here should be a start, I think, but it definitely needs some improvements to work right. Any idea how I can do this? Thanks!

CodePudding user response：

You can use this solution:

import requests

length = 1
url = "https://developers.humana.com/Resource/GetData?fileType=innetwork&sEcho=1&iColumns=3&sColumns=,,\
                                                                                                &iDisplayStart=0&iDisplayLength="

r = requests.get(url str(length))
json_data = r.json()

length = json_data['iTotalRecords']
print("files ", length)
r = requests.get(url str(length))
json_data = r.json()

for e in json_data['aaData']:
    download_url = "https://developers.humana.com/Resource/DownloadPCTFile?fileType=innetwork&fileName="   e['name']
    print(e['name'])
    print("download url: ", download_url)

then just download files in loop.

CodePudding user response：

wait = WebDriverWait(d, 20)
d.get('https://developers.humana.com/Resource/PCTFilesList?fileType=innetwork')
for i in range(2,101):
    time.sleep(1)
    j=i
    if i>5:
        j=5
    #links=d.find_elements(By.CSS_SELECTOR,"a.download-pct-file-link")
    #print(len(links))
    #for link in links:
        # link.click()
    wait.until(EC.element_to_be_clickable((By.XPATH, f"//a[@data-dt-idx='{j}']"))).click()
    print(f"//a[@data-dt-idx='{j}']")

I got it to go through the pages while switching the value to click to be 5 after page 5.data-dx-idx went from 2-5 then stayed at 5.You can most likely do it without time.sleep() if you handle the stales.

Import:

import time