Combinations & Numpy-CodePudding

I need to rate each combination in order to get the best one.

I completed the first step but it is not optimized at all.

When the value of RQ or NBPIS or NBSER is big, my code is much too long.

Do you have an idea to get the same result much faster?

Thank you very much

    import numpy as np    
    from itertools import combinations, combinations_with_replacement
    
    #USER SETTINGS
    
    RQ=['A','B','C','D','E','F','G','H']
    NBPIS=3
    NBSER=3
    
    #CODE
    
    Combi1=np.array(list(combinations_with_replacement(RQ,NBPIS)))  
    Combi2=combinations_with_replacement(Combi1,NBSER)
    Combi3=np.array([])

    Compt=0
    First=0

    for X in Combi2:
        
        Long=0
        Compt=Compt 1
        Y=np.array(X)
            
        for Z in RQ:
            
            Long=Long 1
            
            if Z not in Y:
                break
            
            elif Long==len(RQ):
               
                if First==0:
                    Combi3=Y
                    Combi3 = np.expand_dims(Combi3, axis = 0)
                    First=1
                    
                else:
                    Combi3=np.append(Combi3, [Y], axis = 0)
    
    #RESULTS
    
    print(Combi3)
    print(Combi3.shape)          
    print(Compt)

CodePudding user response：

Assuming your code produces the desirable result, the first step to optimize your code is refactoring it. This might help others to jump in and help as well.
Let's start making a function of it.

#USER SETTINGS
RQ=['A','B','C','D','E','F','G','H']
NBPIS=3
NBSER=3

def your_code():
    Combi1=np.array(list(combinations_with_replacement(RQ,NBPIS)))
    Combi2=combinations_with_replacement(Combi1,NBSER)
    Combi3=np.array([])

    Compt=0
    First=0
    for X in Combi2:
        Long=0
        Compt=Compt 1
        Y=np.array(X)
        for Z in RQ:
            Long=Long 1
            if Z not in Y:
                break
            elif Long==len(RQ):
                if First==0:
                    Combi3=Y
                    Combi3 = np.expand_dims(Combi3, axis = 0)
                    First=1
                else:
                    Combi3=np.append(Combi3, [Y], axis = 0)

    shape = Combi3.shape 
    size = Compt
    return Combi3, shape, size

Refactoring

Notice that Compt is equal to len(Combi2), so turning Combi2 as a numpy array will help to simplify the code. This also allow the variable Y to be replaced by X only. Also, there is no need for Combi1 to be a numpy array since it is only consumed by combinations_with_replacement.

def your_code_refactored():
    Combi1 = combinations_with_replacement(RQ,NBPIS)
    Combi2 = np.array(list(combinations_with_replacement(Combi1,NBSER)))
    Combi3=np.array([])

    First=0
    for X in Combi2:
        Long=0
        for Z in RQ:
            Long=Long 1
            if Z not in X:
                break
            elif Long==len(RQ):
                if First==0:
                    Combi3=X
                    Combi3 = np.expand_dims(Combi3, axis = 0)
                    First=1
                else:
                    Combi3=np.append(Combi3, [X], axis = 0)

    shape = Combi3.shape 
    size = len(Combi2)
    return Combi3, shape, size

Next thing is to refactor how Combi3 is created and populated. The varialbe First is used to expand Combi3 dimension in the first interaction only, so this logic can be simplified as,

def your_code_refactored():
    Combi1 = combinations_with_replacement(RQ,NBPIS)
    Combi2 = np.array(list(combinations_with_replacement(Combi1,NBSER)))
    Combi3 = np.empty((0, NBPIS, NBSER))

    for X in Combi2:
        Long=0
        for Z in RQ:
            Long=Long 1
            if Z not in X:
                break
            elif Long==len(RQ):
                Combi3 = np.append(Combi3, [X], axis = 0)

    shape = Combi3.shape 
    size = len(Combi2)
    return Combi3, shape, size

It seems Combi2 is populated only with combinations that have at least one of each element from RQ. This is accomplished by testing if each element of RQ is in X, which is basically checking if RQ is a subset of X. So it is simplified further,

def your_code_refactored():
    Combi1 = combinations_with_replacement(RQ,NBPIS)
    Combi2 = np.array(list(combinations_with_replacement(Combi1,NBSER)))
    Combi3 = np.empty((0, NBPIS, NBSER))

    unique_RQ = set(RQ)
    for X in Combi2:
        if unique_RQ.issubset(X.flatten()):
            Combi3 = np.append(Combi3, [X], axis = 0)

    shape = Combi3.shape 
    size = len(Combi2)
    return Combi3, shape, size

This looks much simpler. Time to make it faster, maybe :)

Optimizing

One way this code can be optimized is to replace np.append by list.append. In numpy's documentation we see that np.append reallocate a larger and larger array each time it is called. The code might be speed up with list.append, since it over-allocates memory. So the code could be rewritten with list comprehension.

def your_code_refactored_and_optimized():
    Combi1 = combinations_with_replacement(RQ,NBPIS)
    Combi2 = np.array(list(combinations_with_replacement(Combi1,NBSER)))

    unique_RQ = set(RQ)
    Combi3 = np.array([X for X in Combi2 if unique_RQ.issubset(X.flatten())])

    shape = Combi3.shape 
    size = len(Combi2)
    return Combi3, shape, size

Testing

Now we can see it run faster.

import timeit
n = 5
print(timeit.timeit('your_code()', globals=globals(), number=n))
print(timeit.timeit('your_code_refactored_and_optimized()', globals=globals(), number=n))

This isn't much a gain in speed but it's something :)

CodePudding user response：

I have an idea to reduce execution time by removing unnecessary combinations, simplifying with the following example with :

RQ=['A','B','C']
NBPIS=3
NBSER=3

Currently with :

Combi1 = combinations_with_replacement(RQ,NBPIS)
print(list(Combi1))

[('A', 'A', 'A'), ('A', 'A', 'B'), ('A', 'A', 'C'), ('A', 'B', 'B'), ('A', 'B', 'C'), ('A', 'C', 'C'), ('B', 'B', 'B'), ('B', 'B', 'C'), ('B', 'C', 'C'), ('C', 'C', 'C')]

But with :

Combi1 = list(list(combinations(RQ,W)) for W in range(1,NBPIS 1))
print(Combi1)

[[('A',), ('B',), ('C',)], [('A', 'B'), ('A', 'C'), ('B', 'C')], [('A', 'B', 'C')]]

Problem with :

Combi1 = list(list(combinations(RQ,W)) for W in range(1,NBPIS 1))

Error message :

Combi3 = np.array([X for X in Combi2 if unique_RQ.issubset(X.flatten())])

TypeError: unhashable type: 'list'

But with :

(Combi1 = combinations(RQ,W) for W in range(1,NBPIS 1))
print(Combi3)

[]

Questions :

For Combi1,

Instead of :

[[('A',), ('B',), ('C',)], [('A', 'B'), ('A', 'C'), ('B', 'C')], [('A', 'B', 'C')]]

how to get this ? :

[('A'), ('B'), ('C'), ('A', 'B'), ('A', 'C'), ('B', 'C'), ('A', 'B', 'C')]

For Combi3, is it possible to get an array with different sizes ?

Instead of :

[[['A' 'A' 'A'] ['A' 'A' 'A'] ['A' 'B' 'C']]...

Obtain this ? :

[[['A'] ['A'] ['A' 'B' 'C']]...