Fastest way to iterate over multiple list comprehensions-CodePudding

I have the following code:

def func(value, start=None, end=None):
    if start is not None and start != 0:
        start = -start
    elif start == 0:
        start = None
    if end is not None:
        end = -end - 1
    return int('{:032b}'.format(value)[end:start], 2)

data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]
data_dict = [{} for _ in range(len(starts))]

for ii, (start, stop) in enumerate(zip(starts, stops)):
    range_array = np.arange(start, stop, 2)
    data_dict[ii]['one'] = [func(value, 0, 8) for value in data[range_array]]
    data_dict[ii]['two'] = [func(value, 9, 17) for value in data[range_array]]
    data_dict[ii]['three'] = [func(value, 27, 27) for value in data[range_array]]
    data_dict[ii]['four'] = [func(value, 28, 28) for value in data[range_array]]

The problem is that this code runs through relatively slowly. However, all other approaches I have tried so far are even slower. Does anyone have an idea how to rewrite this code so that it runs through faster?

CodePudding user response：

You can use numpy broadcasting to vectorize the bitmasking with logical and & and shifting >>.

import numpy as np

np.random.seed(100)
data = np.random.randint(1, 429496729, 10000)
starts = [10, 50, 100, 200]
stops = [30, 90, 170, 250]

# equal to 'start' from calling func(value, start, end)
shift = np.array([0,9,27,28])[:, None]

# equal to 'end - start   1' from calling func(value, start, end)
bitmask = np.array([9,9,1,1])[:, None]
  
d = [data[start:stop:2] >> shift & (2**bitmask - 1) for start, stop in zip(starts, stops)]

To access the result list d

d[0]

Output

array([[ 54, 227, 291, 281, 229,  59, 508,  87, 365, 416],
       [ 40, 207, 353, 168, 214, 271, 338, 268, 419,  52],
       [  1,   0,   0,   0,   0,   0,   1,   1,   0,   0],
       [  0,   1,   1,   1,   0,   0,   0,   1,   1,   0]])

And access similar to your dictionarys

one, two, three, four = np.arange(4)
d[1][two]

Output

array([ 68, 479, 230, 295, 278, 455, 276,  45, 360, 488, 241, 336, 447,
       316, 181,  94, 138, 404, 223, 310])

In the event that the downvote is that this answer does not produce the "exact" result then adding:

actual = [
    {
        name: x[index].tolist()
        for index, name
        in enumerate(["one","two","three","four"])
    }
    for x in d
]

Produces the exact result and maintains an order of magnitude boot in performance.

CodePudding user response：

You must fix your IDE, I mean your Virtual environment. Using Jupyter is relatively slow, but Pycharm and VS code are very fast. Even with complex code, I have never faced the problem of code being slow except on Jupiter where it doesn't load. Maybe increasing empty space of your PC by deleting unnecessary files would also work