Home > Net >  Fastest way of updating a Python list based on indices
Fastest way of updating a Python list based on indices

Time:04-24

I have a Python dictionary like this -

my_dict = {'Names':['Tom', 'Mariam', 'Lata', 'Tina', 'Abin'], 
           'Attendance':[False, False, False, False, False]}

I also have a Python list of flags for which indices need to changed to True in my_dict['Attendance'] -

flag_list = [0, 2, 3]

Based on the flag_list, my_dict needs to be changed to -

my_dict = {'Names':['Tom', 'Mariam', 'Lata', 'Tina', 'Abin'], 
           'Attendance':[True, False, True, True, False]}

What would be the fastest way of achieving this? Can it be done without a loop? Thank you for any guidance.

CodePudding user response:

Using a loop

for index in flag_list:
    my_dict['Attendance'][index] = True

A micro optimization would be to fetch the list from the dict only once:

attendance_list = my_dict['Attendance']
for index in flag_list:
    attendance_list[index] = True

But unless flag_list is thousands elements long I wouldn't worry about it.

Using vectorization

If you are willing to take advantage of vectorization you can use a numpy array:

import numpy as np

my_dict = {'Names':['Tom', 'Mariam', 'Lata', 'Tina', 'Abin'],
           'Attendance': np.array([False, False, False, False, False])}
flag_list = [0, 2, 3]
my_dict['Attendance'][flag_list] = True

But again, unless your data is very big I wouldn't worry about optimizing this piece of code very much.

Example timings

import random

from timeit import Timer

import numpy as np


ATTENDANCE_LIST_SIZE = 100000
FLAG_LIST_SIZE = 60000

dict_with_numpy = {'Attendance': np.random.choice([False, True], 
                                 ATTENDANCE_LIST_SIZE)}
dict_without_numpy = {'Attendance': random.choices([False, True], 
                                    k=ATTENDANCE_LIST_SIZE)}
flag_list = random.choices(range(ATTENDANCE_LIST_SIZE), k=FLAG_LIST_SIZE)


def using_numpy():
    dict_with_numpy['Attendance'][flag_list] = True


def no_numpy_pre_fetching_list():
    attendance_list = dict_without_numpy['Attendance']
    for index in flag_list:
        attendance_list[index] = True


def no_numpy():
    for index in flag_list:
        dict_without_numpy['Attendance'][index] = True


print(f'no_numpy\t\t\t\t\t\t{min(Timer(no_numpy).repeat(3, 3))}')
print(f'no_numpy_pre_fetching_list\t\t{min(Timer(no_numpy_pre_fetching_list).repeat(3, 3))}')
print(f'using_numpy\t\t\t\t\t\t{min(Timer(using_numpy).repeat(3, 3))}')

For this amount of data, the output is (on my machine)

no_numpy                        0.009737916999999985
no_numpy_pre_fetching_list      0.0048406370000000365
using_numpy                     0.009164470000000036

So using vectorization for this data is not the most efficient.

  • Related