Home > Back-end >  Python chunks write to excel
Python chunks write to excel

Time:10-13

I am new to python and I m learning by doing.

At this moment, my code is running quite slow and it seems to take longer and longer by each time I run it.

The idea is to download an employee list as CSV, then to check the location of each Employee ID by running it trough a specific page then writing it to an excel file.

We have around 600 associates on site each day and I need to find their location and to keep refreshing it each 2-4 minutes.

EDIT:

For everyone to have a better understanding, I have a CSV file ( TOT.CSV ) that contains Employee ID's, Names and other information of the associates that I have on site.

In order to get their location, I need to run each employee ID from that CSV file trough https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId= 1 by 1 and at the same time to write it in another CSV file ( Location.csv ). Right now, it does in about 10 minutes and I want to understand if the way I did it is the best possible way, or if there is something else that I could try.

My code looks like this:

# GET EMPLOYEE ID FROM THE CSV


data = read_csv("Z:\\_Tracker\\Dump\\attendance\\TOT.csv")

# converting column data to list
TOT_employeeID = data['Employee ID'].tolist()


# Clean the Location Sheet


with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv", "w") as f:
    pass

print("Previous Location data cleared ... ")


# go through EACH employee ID to find out location


for x in TOT_employeeID:
    driver.get(
        "https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId="   x)
    print("Getting Location data for EmployeeID: "   x)
    locData = driver.find_element(By.TAG_NAME, 'body').text
    aaData = str(locData)
    realLoc = aaData.split('"')

    # write to excel
    with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv",
              "a") as f:
        writer = csv.writer(f)
        writer.writerow(realLoc)

time.sleep(5)
print("Employee Location data downloaded...")

Is there a way I can do this faster?

Thank you in advance!

Regards, Alex

CodePudding user response:

You could try separating the step of reading the information and writing the information to your CSV file, like below:

# Get Employee Location Information
# Create list for employee information, to be used below
employee_Locations = []
 
for x in TOT_employeeID:
    driver.get("https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId="   x)
    print("Getting Location data for EmployeeID: "   x)
    locData = driver.find_element(By.TAG_NAME, 'body').text
    aaData = str(locData)
    realLoc = [aaData.split('"')]
    employee_Locations.extend(realLoc)
            
# Write to excel - Try this as a separate step
with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv","a") as f:
    writer = csv.writer(f, delimiter='\n')
    writer.writerow(employee_Locations)
            
print("Employee Location data downloaded...")

You may see some performance gains by collecting all your information first, then writing to your CSV file

CodePudding user response:

Something like this.

import concurrent.futures


def process_data(data: pd.DataFrame) -> None:
    associates = data['Employee ID'].unique()
    with concurrent.futures.ProcessPoolExecutor() as executer:
        executer.map(get_location, associates)


def get_location(associate: str) -> None:
    driver.get(
        "https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?"
        f"employeeId={associate}")
    print(f"Getting Location data for EmployeeID: {associate}")
    realLoc = str(driver.find_element(By.TAG_NAME, 'body').text).split('"')

    with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv", "a") as f:
        writer = csv.writer(f)
        writer.writerow(realLoc)


if __name__ == "__main__":
    data = read_csv("Z:\\_Tracker\\Dump\\attendance\\TOT.csv")
    process_data(data)
  • Related