Improving the performance of cartesian product of multiple lists-CodePudding

I am implementing the Cartesian product of multiple sets in python using recursion.

Here is my implementation:

def car_two_sets(a, b):
    result = []
    for x in a:
        for y in b:
            result.append(str(x)   str(y))
    return result


def car_multiple_sets(lists):
    if len(lists) == 2:
        return car_two_sets(lists[0], lists[1])
    else:
        return car_multiple_sets([car_two_sets(lists[0], lists[1])]   lists[2:])


a = [1, 2]
b = [3, 4]
c = [6, 7, 8]
lists = [a, b, c]
print(car_multiple_sets(lists))

The code works correctly, but for larger number of sets, it is slow. Any ideas on how to improve this implementation? I thought of memoization, but could not find any repetitive calculations to cache.

I do not want to use itertools functions.

CodePudding user response：

Benchmark with three times more lists:

 221 us   223 us   223 us  h
 225 us   227 us   227 us  k3
 228 us   229 us   229 us  k2
 267 us   267 us   267 us  k
 340 us   341 us   342 us  g
1177 us  1185 us  1194 us  car_multiple_sets
3057 us  3082 us  3084 us  f

Code (Try it online!):

from timeit import repeat
from random import shuffle
from bisect import insort
from itertools import product, starmap
from operator import concat

def car_two_sets(a, b):
    result = []
    for x in a:
        for y in b:
            result.append(str(x)   str(y))
    return result


def car_multiple_sets(lists):
    if len(lists) == 2:
        return car_two_sets(lists[0], lists[1])
    else:
        return car_multiple_sets([car_two_sets(lists[0], lists[1])]   lists[2:])

def f(lists):
    return [''.join(map(str,a)) for a in product(*lists)]

def g(lists):
    return [''.join(a) for a in product(*[map(str,a)for a in lists])]

def h(lists):
    return list(map(''.join, product(*[map(str,a)for a in lists])))

def k(lists):
    result = ['']
    for lst in lists:
        lst = [*map(str, lst)]
        result = [S   s for S in result for s in lst]
    return result

def k2(lists):
    result = ['']
    for lst in lists:
        result = list(starmap(concat, product(result, map(str, lst))))
    return result

def k3(lists):
    result = ['']
    for lst in lists:
        result = starmap(concat, product(result, map(str, lst)))
    return list(result)

funcs = [car_multiple_sets, f, g, h, k, k2, k3]

a = [1, 2]
b = [3, 4]
c = [6, 7, 8]
lists = [a, b, c]

for func in funcs:
  print(func(lists), func.__name__)

times = {func: [] for func in funcs}
lists *= 3
for _ in range(50):
  shuffle(funcs)
  for func in funcs:
    t = min(repeat(lambda: func(lists), number=1))
    insort(times[func], t)
for func in sorted(funcs, key=times.get):
    print(*('M us ' % (t * 1e6) for t in times[func][:3]), func.__name__)

(f and g are from a currently deleted answer, the k functions are from me)

CodePudding user response：

A few comments:

If you think about it, what car_multiple_sets is doing is iterating on its parameter lists. You're doing that using recursion, but iterating on a list can also be done with a for-loop. And it so happens that recursion is somewhat slow and memory-inefficient in python, so for-loops are preferable.
You don't need to convert to str to group the ints together. You can use tuples. That's precisely what they're for. Replace str(x) str(y) with (x,y) to get a pair of two integers instead of a string.

def car_two_sets(a, b):
    if all(isinstance(x, tuple) for x in a):
        return [(*x, y) for x in a for y in b]
    else:
        return [(x, y) for x in a for y in b]

def car_multiple_sets(lists):
    if len(lists) == 0:
        return [()]
    elif len(lists) == 1:
        return [(x,) for x in lists[0]]
    else:
        result = car_two_sets(lists[0], lists[1])
        for l in lists[2:]:
            result = car_two_sets(result, l)
        return result

print( car_multiple_sets((range(3), 'abc', range(2))) )
# [(0, 'a', 0), (0, 'a', 1), (0, 'b', 0), (0, 'b', 1), (0, 'c', 0), (0, 'c', 1),
#  (1, 'a', 0), (1, 'a', 1), (1, 'b', 0), (1, 'b', 1), (1, 'c', 0), (1, 'c', 1),
#  (2, 'a', 0), (2, 'a', 1), (2, 'b', 0), (2, 'b', 1), (2, 'c', 0), (2, 'c', 1)]