Home > Back-end >  Fastest way to iterate over multiple list comprehensions
Fastest way to iterate over multiple list comprehensions

Time:02-08

I have the following code:

def func(value, start=None, end=None):
    if start is not None and start != 0:
        start = -start
    elif start == 0:
        start = None
    if end is not None:
        end = -end - 1
    return int('{:032b}'.format(value)[end:start], 2)

data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]
data_dict = [{} for _ in range(len(starts))]

for ii, (start, stop) in enumerate(zip(starts, stops)):
    range_array = np.arange(start, stop, 2)
    data_dict[ii]['one'] = [func(value, 0, 8) for value in data[range_array]]
    data_dict[ii]['two'] = [func(value, 9, 17) for value in data[range_array]]
    data_dict[ii]['three'] = [func(value, 27, 27) for value in data[range_array]]
    data_dict[ii]['four'] = [func(value, 28, 28) for value in data[range_array]]

The problem is that this code runs through relatively slowly. However, all other approaches I have tried so far are even slower. Does anyone have an idea how to rewrite this code so that it runs through faster?

CodePudding user response:

You can use numpy broadcasting to vectorize the bitmasking with logical and & and shifting >>.

import numpy as np

np.random.seed(100)
data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]

# equal to 'start' from calling func(value, start, end)
shift = np.array([0,9,27,28])[:, None]

# equal to 'end - start   1' from calling func(value, start, end)
bitmask = np.array([9,9,1,1])[:, None]
  
d = [data[start:stop:2] >> shift & (2**bitmask - 1) for start, stop in zip(starts, stops)]

To access the result list d

d[0]

Output

array([[ 54, 227, 291, 281, 229,  59, 508,  87, 365, 416],
       [ 40, 207, 353, 168, 214, 271, 338, 268, 419,  52],
       [  1,   0,   0,   0,   0,   0,   1,   1,   0,   0],
       [  0,   1,   1,   1,   0,   0,   0,   1,   1,   0]])

And access similar to your dictionarys

one, two, three, four = np.arange(4)
d[1][two]

Output

array([ 68, 479, 230, 295, 278, 455, 276,  45, 360, 488, 241, 336, 447,
       316, 181,  94, 138, 404, 223, 310])

In the event that the downvote is that this answer does not produce the "exact" result then adding:

actual = [
    {
        name: x[index].tolist()
        for index, name
        in enumerate(["one","two","three","four"])
    }
    for x in d
]

Produces the exact result and maintains an order of magnitude boot in performance.

CodePudding user response:

You must fix your IDE, I mean your Virtual environment. Using Jupyter is relatively slow, but Pycharm and VS code are very fast. Even with complex code, I have never faced the problem of code being slow except on Jupiter where it doesn't load. Maybe increasing empty space of your PC by deleting unnecessary files would also work

  •  Tags:  
  • Related