Python chunks write to excel-CodePudding

I am new to python and I m learning by doing.

At this moment, my code is running quite slow and it seems to take longer and longer by each time I run it.

The idea is to download an employee list as CSV, then to check the location of each Employee ID by running it trough a specific page then writing it to an excel file.

We have around 600 associates on site each day and I need to find their location and to keep refreshing it each 2-4 minutes.

EDIT:

For everyone to have a better understanding, I have a CSV file ( TOT.CSV ) that contains Employee ID's, Names and other information of the associates that I have on site.

In order to get their location, I need to run each employee ID from that CSV file trough https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId= 1 by 1 and at the same time to write it in another CSV file ( Location.csv ). Right now, it does in about 10 minutes and I want to understand if the way I did it is the best possible way, or if there is something else that I could try.

My code looks like this:

# GET EMPLOYEE ID FROM THE CSV


data = read_csv("Z:\\_Tracker\\Dump\\attendance\\TOT.csv")

# converting column data to list
TOT_employeeID = data['Employee ID'].tolist()


# Clean the Location Sheet


with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv", "w") as f:
    pass

print("Previous Location data cleared ... ")


# go through EACH employee ID to find out location


for x in TOT_employeeID:
    driver.get(
        "https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId="   x)
    print("Getting Location data for EmployeeID: "   x)
    locData = driver.find_element(By.TAG_NAME, 'body').text
    aaData = str(locData)
    realLoc = aaData.split('"')

    # write to excel
    with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv",
              "a") as f:
        writer = csv.writer(f)
        writer.writerow(realLoc)

time.sleep(5)
print("Employee Location data downloaded...")

Is there a way I can do this faster?

Thank you in advance!

Regards, Alex

CodePudding user response：

You could try separating the step of reading the information and writing the information to your CSV file, like below:

# Get Employee Location Information
# Create list for employee information, to be used below
employee_Locations = []
 
for x in TOT_employeeID:
    driver.get("https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId="   x)
    print("Getting Location data for EmployeeID: "   x)
    locData = driver.find_element(By.TAG_NAME, 'body').text
    aaData = str(locData)
    realLoc = [aaData.split('"')]
    employee_Locations.extend(realLoc)
            
# Write to excel - Try this as a separate step
with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv","a") as f:
    writer = csv.writer(f, delimiter='\n')
    writer.writerow(employee_Locations)
            
print("Employee Location data downloaded...")

You may see some performance gains by collecting all your information first, then writing to your CSV file

CodePudding user response：

Something like this.

import concurrent.futures


def process_data(data: pd.DataFrame) -> None:
    associates = data['Employee ID'].unique()
    with concurrent.futures.ProcessPoolExecutor() as executer:
        executer.map(get_location, associates)


def get_location(associate: str) -> None:
    driver.get(
        "https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?"
        f"employeeId={associate}")
    print(f"Getting Location data for EmployeeID: {associate}")
    realLoc = str(driver.find_element(By.TAG_NAME, 'body').text).split('"')

    with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv", "a") as f:
        writer = csv.writer(f)
        writer.writerow(realLoc)


if __name__ == "__main__":
    data = read_csv("Z:\\_Tracker\\Dump\\attendance\\TOT.csv")
    process_data(data)