I have a Python dictionary like this -
my_dict = {'Names':['Tom', 'Mariam', 'Lata', 'Tina', 'Abin'],
'Attendance':[False, False, False, False, False]}
I also have a Python list of flags for which indices need to changed to True in my_dict['Attendance']
-
flag_list = [0, 2, 3]
Based on the flag_list
, my_dict
needs to be changed to -
my_dict = {'Names':['Tom', 'Mariam', 'Lata', 'Tina', 'Abin'],
'Attendance':[True, False, True, True, False]}
What would be the fastest way of achieving this? Can it be done without a loop? Thank you for any guidance.
CodePudding user response:
Using a loop
for index in flag_list:
my_dict['Attendance'][index] = True
A micro optimization would be to fetch the list from the dict only once:
attendance_list = my_dict['Attendance']
for index in flag_list:
attendance_list[index] = True
But unless flag_list
is thousands elements long I wouldn't worry about it.
Using vectorization
If you are willing to take advantage of vectorization you can use a numpy array:
import numpy as np
my_dict = {'Names':['Tom', 'Mariam', 'Lata', 'Tina', 'Abin'],
'Attendance': np.array([False, False, False, False, False])}
flag_list = [0, 2, 3]
my_dict['Attendance'][flag_list] = True
But again, unless your data is very big I wouldn't worry about optimizing this piece of code very much.
Example timings
import random
from timeit import Timer
import numpy as np
ATTENDANCE_LIST_SIZE = 100000
FLAG_LIST_SIZE = 60000
dict_with_numpy = {'Attendance': np.random.choice([False, True],
ATTENDANCE_LIST_SIZE)}
dict_without_numpy = {'Attendance': random.choices([False, True],
k=ATTENDANCE_LIST_SIZE)}
flag_list = random.choices(range(ATTENDANCE_LIST_SIZE), k=FLAG_LIST_SIZE)
def using_numpy():
dict_with_numpy['Attendance'][flag_list] = True
def no_numpy_pre_fetching_list():
attendance_list = dict_without_numpy['Attendance']
for index in flag_list:
attendance_list[index] = True
def no_numpy():
for index in flag_list:
dict_without_numpy['Attendance'][index] = True
print(f'no_numpy\t\t\t\t\t\t{min(Timer(no_numpy).repeat(3, 3))}')
print(f'no_numpy_pre_fetching_list\t\t{min(Timer(no_numpy_pre_fetching_list).repeat(3, 3))}')
print(f'using_numpy\t\t\t\t\t\t{min(Timer(using_numpy).repeat(3, 3))}')
For this amount of data, the output is (on my machine)
no_numpy 0.009737916999999985
no_numpy_pre_fetching_list 0.0048406370000000365
using_numpy 0.009164470000000036
So using vectorization for this data is not the most efficient.