Home > database >  How can I download and save a file on a mapped server drive using python/selenium?
How can I download and save a file on a mapped server drive using python/selenium?

Time:10-06

I'm trying to automate the download process of some reports in my company, using selenium-chromedriver and python.

For developing and testing purposes I mirrored the relevant folder structure on the server to my pc, as I need to do some navigation between folders.

What I'm attempting to do is basically:

  1. Open a site;
  2. Navigate to the report page;
  3. Set report parameters;
  4. Download the report (it's an .xlsx file) to a specific folder;
  5. Append this file to an existing .xlsx file in the same folder;
  6. Save and close;
  7. Repeat for other folders;

Here's my problem: When I'm working with the folders on my own PC (as the default download folders) the code works perfectly, but when I change the path to the actual folders (on the company server) the download returns a "Fail to download" error.

I checked the folder permissions, and I have both read and write permissions and I work on and download to these folders on a daily basis. Also, before the download, I need to read a file in said folder to check the most recent date and that works without any errors.

This is the function that creates the webdriver object:

def launch_driver(download_dir):
    """
    Mudar o <download_dir> para o local padrão de download.
    """
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    options.add_argument("--window-size=1360x728")
    # options.add_argument("--window-size=1920x1080")
    options.add_argument("--disable-notifications")
    options.add_argument('--no-sandbox')
    options.add_argument('--verbose')
    options.add_argument("--disable-single-click-autofill")
    options.add_argument("--disable-autofill-keyboard-accessory-view[8]")
    options.add_experimental_option("prefs", {
        "download.default_directory": download_dir,
        "download.prompt_for_download": False,
        "download.directory_upgrade": True,
        "safebrowsing_for_trusted_sources_enabled": False,
        "safebrowsing.enabled": False
    })
    options.add_argument('--disable-gpu')
    options.add_argument('--disable-software-rasterizer')
    options.add_argument('--log-level=3')
    options.binary_location = 'C:\\Users\\user\\AppData\\Local\\' \
        'Google\\Chrome Beta\\Application\\chrome.exe'

    return webdriver.Chrome(options=options, service=s)

And here is the main code:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By #The selenium imports are for the get_tarefas function
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep, time
from datetime import date, timedelta
import warnings
from os import remove

warnings.filterwarnings('ignore', category=UserWarning, module='openpyxl')

    user = 'username'
    senha = 'password'

    start = time()

    unidades = {
        "gssfs": "KPIs SFS", "gs4quiteria": "KPIs Quitéria", "gs7": "KPIs GS7"
    }

    # Iniciando Serviço selenium
    s = Service(executable_path='C:\\Python\\chromedriver.exe')

    for i in unidades:

        inner_start = time()

        url = "https://"   i   ".site_I_download_from.com.br/login"

        download_dir = r"//path_to_mapped_drive/"   unidades.get(i)

        # Reads .xlsx file into dataframe and check dates.
        racDF = pd.read_excel(
            download_dir   "/Arraçoamento.xlsx", sheet_name="Dados")
        racDF["Data"] = racDF["Data"].dt.date
        lastdate = racDF["Data"].iloc[-1]   timedelta(1)
        currdate = date.today()   timedelta(-2)

        # Checa se a atualização é necessária (HOJE - 2 DIAS = ATUALIZADO)
        if currdate <= lastdate:
            print("Dados em {} já estão atualizados! Última data é {}"
                  .format(unidades.get(i), lastdate   timedelta(-1)))
            continue
        else:
            # Navigate url and clicks on the export button. Not the problem
            get_tarefas(url, download_dir, lastdate, currdate) # <--- call launch_driver is inside 
        
            # Reads downloaded .xlsx file into dataframe.
            tarefas = pd.read_excel(
                download_dir   "/exportar_tarefas.xlsx", sheet_name="Dados")
            tarefas["Data"] = tarefas["Data"].dt.date

            # Appending both files.
            racDF = pd.concat([racDF, tarefas], ignore_index=True)
            racDF.to_excel(download_dir   "/Arraçoamento.xlsx",
                            sheet_name="Dados", index=False)

            # Deleta o último arquivo baixado ao final do loop.
            remove(download_dir   "/exportar_tarefas.xlsx")

        # Aviso de conclusão da iteração.
        idx = unidades.get(i).find(" ")
        inner_time = timedelta(seconds=(time() - inner_start))
        print("Unidade {} concluída! Tempo total: {}"
              .format(unidades.get(i)[idx 1:], remove_micros(inner_time)))

    end = time()
    total_time = timedelta(seconds=(end - start))
    print("Script finalizado!")
    print("Tempo total para execução: {}".format(remove_micros(total_time)))

(Sorry if the code is messy, I don't have much experience)

I didn't put the function that navigates the browser because the question is already huge, and I know it's not the problem.

I suspect it's something in the options when I initialize the browser, but I couldn't find what. I know it's hard to evaluate code without testing it, but could you please take a look?

If you need any more info, I'll update the question!

Thank you!

CodePudding user response:

I don't know why this isn't working with a network address. But as a workaround you could define a user data directory:

chrome_user_data_path = "C:/Users/youruser/selenium"
options.add_argument(f"user-data-dir={chrome_user_data_path}")

On the first time you run the script, comment out options.add_argument("--headless") and add a time.sleep(60) immediately after you open the browser. Now, change the default download directory in the opened Chrome window manually (over the UI) within the sleep period. When you've done this once, you can uncomment options.add_argument("--headless") and remove time.sleep(60). On the next calls, the download directory should remain at the network address.

Edit: I just realized that this will not work for you because you have different subfolders inside your network address that you need to change to. An option could be to define a separate user data directory for every element in your array unidades.

Edit 2: Try to use

download_dir = "\\\\path_to_mapped_drive\"   unidades.get(i)

(note the \\\\). On my end, this worked.

  • Related