Home > OS >  Improve performance of Cartesian product without duplicates and repeated elements
Improve performance of Cartesian product without duplicates and repeated elements

Time:02-22

Input:

import numpy as np
import itertools
a = np.array([2,  3,  5, 13, 14, 16, 19, 20, 22, 29, 30, 34, 41, 43, 45, 49, 50,
       54, 60, 63, 66, 73, 75, 79, 82, 93]).astype(np.uint8)
b = np.array([ 9, 10, 11, 13, 14, 16, 17, 20, 22, 28, 30, 35, 36, 41, 45, 47, 49,
       54, 58, 63, 64, 66, 68, 73, 74, 76, 78, 83, 84, 86, 87, 92, 94, 95]).astype(np.uint8)
c = np.array([ 8, 11, 15, 17, 28, 32, 36, 38, 39, 42, 47, 52, 53, 58, 61, 62, 64,
       68, 69, 70, 71, 74, 78, 79, 80, 81, 83, 84, 85, 86, 89, 92]).astype(np.uint8)
d = np.array([ 6,  8, 12, 15, 21, 23, 25, 26, 31, 32, 37, 38, 44, 51, 52, 53, 56,
       57, 61, 62, 65, 67, 70, 71, 72, 81, 85, 89, 90]).astype(np.uint8)
e = np.array([ 0,  1,  4,  7, 18, 21, 23, 24, 25, 26, 27, 33, 40, 44, 46, 48, 51,
       55, 56, 57, 59, 65, 67, 77, 88, 90, 91]).astype(np.uint8)

Current Solution:

# Get product and ensure no element is repeated
main_combos = np.array([i for i in itertools.product(a, b, c, d, e) if len(set(i)) == 5])

# Remove duplicates (should not return both [13, 14, 8, 6, 0], [14, 13, 8, 6, 0])
u = np.sort(main_combos, axis=1)
_, idx = np.unique(u, axis=0, return_index=True)
main_combos = main_combos[idx]

This currently takes ~41 seconds on my machine to get all 14,133,909 rows, I was wondering if the performance of this can be improved.

The order of each individual row needs to stay the same for example the first element of each array [2, 9, 8, 6, 0] should be in this same order in the final output.

CodePudding user response:

Your solution takes 58 seconds on my machine.

If you dont require the use of Numpy, the following runs in 23 seconds on my machine, using pure Python

main_combos = {tuple(set(i)): i for i in itertools.product(a, b, c, d, e)}
main_combos = [val for key, val in main_combos.items() if len(key)==5]
  • Related