Fastest way to make 800 get requests using same url but passing different ids everytime-CodePudding

What would be the fastest way to load Device IDs from an excel sheet which contains 800 Device IDs and pass these Device IDs in a http get request.

I'm fetching Device IDs from the excel sheet, making http get request to get the relevant data and dump it into a list and then save it in an excel file using : -

if __name__ == '__main__':
    excel_file = openpyxl.load_workbook("D:\mypath\Book1.xlsx")
    active_sheet = excel_file.get_sheet_by_name("Sheet4")

    def iter_rows(active_sheet):
        for row in active_sheet.iter_rows(): 
            yield [cell.value for cell in row]

    res = iter_rows(active_sheet)
    keys = next(res)

    final_data_to_dump = []
    failed_data_dump = []

    for new in res:
        
        inventory_data = dict(zip(keys, new))
        if None in inventory_data.values():
            pass
        else:
            
            url_get_event = 'https://some_url&source={}'.format(inventory_data['DeviceID'])

            header_events = {
            'Authorization': 'Basic authkey_here'}

            print(inventory_data['DeviceID'])
            try:
                r3 = requests.get(url_get_event, headers=header_events)
                r3_json = json.loads(r3.content)
                if r3_json['events']:
                    for object in r3_json['events']:

                        dict_excel_data = {

                            "DeviceID":object['source']['id'],
                            "Device Name":object['source']['name'],
                            "Start 1":object['Start1'],
                            "Start 2":object['Start2'],
                            "Watering Mode":object['WateringMode'],
                            "Duration":object['ActuationDetails']['Duration'],
                            "Type":object['type'],
                            "Creation Time":object['creationTime']
                        }

                        final_data_to_dump.append(dict_excel_data)
                else:
                    no_dict_excel_data = {

                            "DeviceID":inventory_data["DeviceID"],
                            "Device Name":inventory_data["DeviceName"],
                            "Start 1":"",
                            "Start 2":"",
                            "Watering Mode":"",
                            "Duration":"",
                            "Type":"",
                            "Creation Time":""
                        }

                    final_data_to_dump.append(no_dict_excel_data)

            except requests.ConnectionError:
                failed_dict_excel_data = {

                        "DeviceID":inventory_data['DeviceID'],
                        "Device Name":inventory_data["DeviceName"],
                        "Status":"Connection Error"
                    }

                failed_data_dump.append(failed_dict_excel_data)
    df = pd.DataFrame.from_dict(final_data_to_dump)
    df2 = pd.DataFrame.from_dict(failed_data_dump)
    df.to_excel('D:\mypath\ReportReceived_10Apr.xlsx',sheet_name='Sheet1',index=False)     
    df2.to_excel('D:\mypath\Failed_ReportReceived_10Apr.xlsx',sheet_name='Sheet1',index=False)

But this can take upwards of 10-15 mins as there are 800 devices in the Book1 sheet and it's likely to increase. How can I make this process faster?

CodePudding user response：

You can use an async library, but the easiest solution here would be to do something like

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor() as exc:
    responses = exc.map(get, device_ids)

def get(device_id):
    url_get_event = 'https://some_url&source={}'.format(device_id)
    return requests.get(url_get_event)

If the other part of your code is small you may want to submit the functions to the executor and use as_completed to handle them in the main thread while waiting for other requests to run too.