How do you update a list, given a list of indexes to be udpated.
For example, lets say we have a list
l = ['4', '5', '8', '19', '2', '53', '125']
and
indexes = [0, 1, 4, 5]
Say we want to update these indexes to an int
. The desired updated list will be
l = [4, 5, '8', '19', 2, 53, '125']
Yes we can do easily with a loop like:
for i in indexes:
l[i] = int(l[i])
But actually, something like this is a bit faster
l[0] = int(l[0])
l[1] = int(l[1])
l[4] = int(l[4])
l[5] = int(l[5])
The difference in speed may not be noticeable here since there are only 4 items, but once there are 40 or more, you can see the difference. But its also a pain to write so many repetitive statements.
Is there a way to optimize this a bit more, maybe using map
and generating a list to update the list (based on specific indexes only or something like that)
CodePudding user response:
Doing something as simple as a replacement of a list element 40 times is not really a test of how it will perform at scale. Consider this instead:
from timeit import timeit
from random import shuffle
def f(xs, indices):
for i in indices:
xs[i] = int(xs[i])
def main():
for p in range(6):
n = 10**(p 3)
shuffle(indices := list(range(n)))
indices = [indices.pop() for __ in range(n // 100)]
xs = [str(x) for x in range(n)]
print(n, timeit(lambda: f(xs, indices), number=1) / n)
main()
Result:
1000 2.799999999997249e-09
10000 1.4999999999987245e-09
100000 2.817999999999987e-09
1000000 3.4772999999999053e-09
10000000 3.8900799999999515e-09
100000000 5.486388000000062e-09
(this shows the time spent on average in a single iteration, for a list of increasing length)
The size of the original list affects performance in a non-trivial way. Also, consider that the code with just replacements instructions has to come from somewhere and since you probably won't write all of it by hand for large lists, you will need to generate it - and that code is guaranteed to be more costly than the actual loop.
So either you only need this for manageably small number of updates, in which case you're right and writing out the instructions is faster - but makes your code larger (affecting load time and free memory). Or you need this for a larger number of updates and it's either not worth it since there is no efficient way to generate the code as an alternative - or you're OK with generating the code elsewhere as well as being OK with sacrificing the time and space needed to load your much larger program, with the only benefit being a marginally faster loop.
However, it seems unlikely that this is an optimisation worth looking at in the first place. The gains here are never going to be more than tiny fractions of seconds, seconds at most for extremely large lists - there must innumerable other places in your code that would allow for far greater gains.
And finally, if you insist speed matters here, you should consider doing the work in better data structures than a list of strings, or a language like Python.
CodePudding user response:
[int(l[i]) if i in indexes else l[i] for i in range(len(l))]
# [4, 5, '8', '19', 2, 53, '125']
%timeit [int(l[i]) if i in indexes else l[i] for i in range(len(l))]
2.82 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)