speed up list iteration bottleneck-CodePudding

I have a bottleneck in a piece of code that is ruining the performance of my code. I re-wrote the section, but, after timing it, things didn't improve.

The problem is as follows. Given a list of fixed-length-lists of integers

data = [[1,2,3], [3,2,1], [8,1,0], [1,3,4]]

I need to append the index of each sublist to a separate list as many times as its list value at a given column index. There is a separate list for each column in the data.

For instance, for the above data, there will be three resulting lists since the sub-lists have three columns.

There are 4 sublists, so we expect the numbers 0-3 to appear in each of the final lists.

We expect the following three lists to be generated from the above data

[[0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3],
[0, 0, 1, 1, 2, 3, 3, 3],
[0, 0, 0, 1, 3, 3, 3, 3]]

I have two ways of doing this:

processed_data = list([] for _ in range(len(data[0])))
for n in range(len(data)):
    sub_list = data[n]
    
    for k, proc_list in enumerate(processed_data):
        for _ in range(sub_list[k]):
            proc_list.append(n)


processed_data = []
for i, col in enumerate(zip(*data)):
    processed_data.append([j for j,count in enumerate(col) for _ in range(count)])

The average size of the data list is around 100,000.

Is there a way I can speed this up?

CodePudding user response：

You can't improve the computational complexity of your algorithm unless you're able to tweak the output format (see below). In other words, you'll at best be able to improve the speed by a modest percentage (and the percentage will be independent of the size of the input).

I don't see any obvious implementation issues. The one idea I had was to get rid of the large number of append() calls and the overhead that is incurred by gradual list expansions by preallocating the output matrix, but @juanpa.arrivillaga suggests in their comment that append() is in fact very optimized on CPython. If you're on another interpreter, you could try it: you know that the length of the output list for column c will be equal to the sum of all the input numbers in column c. So you can just preallocate each output list by using [0] * sum_of_input_values_at_column_c, and then do proc_list[i] = n instead of proc_list.append(n) (and manually increment i). This does, however, require two passes over the input, so it might not actually be an improvement - your problem is quite memory-intensive as its core computation is extremely simple.

The reason that you can't improve the computational complexity is that it is already optimal: any algorithm needs to spend time on generating its output, so the size of the output is a lower bound for how fast the algorithm can possibly be. And in your case, the size of the output is equal to the sum of the values in your input matrix (and it's generally considered bad when you depend on the input values themselves rather than on the number of input values). And that's the number of iterations that your algorithm spends, so it is optimal. However, if the output of this function is going to reside in memory to be consumed by another function (rather than being written to a file), and you are able to make some adaptations in that function, you could instead output a matrix of generators, where each generator knows that it needs to generate sub_list[k] occurrences of n. Then, the complexity of your algorithm becomes proportional to the size of the input matrix (but consuming the output will still take the same amount of time that it would have taken to generate the full output).

CodePudding user response：

Perhaps itertools can make this go faster for you by minimizing the amount of python code inside loops:

data = [[1,2,3], [3,2,1], [8,1,0], [1,3,4]]

from itertools import chain,repeat,starmap
result = [ list(chain.from_iterable(starmap(repeat,r))) 
           for r in map(enumerate,zip(*data))  ]

print(result)
           
[[0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3], 
 [0, 0, 1, 1, 2, 3, 3, 3], 
 [0, 0, 0, 1, 3, 3, 3, 3]]

If you're processing the output in the same order as the result's rows come out, you can convert this to a generator and use it directly in your main process:

iResult = ( chain.from_iterable(starmap(repeat,r)) 
            for r in map(enumerate,zip(*data))  )
            
for iRow in iResult:           # iRow is also an iterator
    for resultItem in iRow:    
        # Perform your item processing here
        print(resultItem, end=" ")
    print()

0 1 1 1 2 2 2 2 2 2 2 2 3 
0 0 1 1 2 3 3 3 
0 0 0 1 3 3 3 3

This will avoid creating and storing the lists of indexes altogether (i.e. bringing that bottleneck down to zero). But that's only if you process the result sequentially