Creating m*n arrays from an array and a boolean "shared" array-CodePudding

The problem: I am trying to generate m vectors of n elements, based on a "packaged", or shorter, master vector, V, that is shorter than m x n, and a length n boolean vector that determines how elements are repeated. (Vectors explained more below). How the Master Vector is created, and the results used, are relevant only in that the format (Master and Boolean, resulting in m x n) must be respected.

For example, if Element 0 has a boolean of False, all m vectors will have the same value for Element 0, V[0]. If Element 1 has a boolean of True, then vector 0 will have Element 1 from V[1], but vector 1 will have Element 1 from V[6]. A Master Vector, V, of;

(1,2,3,4,5,6,10,30,40,60,100,300,400,600)

and a boolean vector of

1, 0, 1, 1, 0, 1

should produce three resulting vectors;

[1 2 3 4 5 6]
[10.  2. 30. 40.  5. 60.]
[100.   2. 300. 400.   5. 600.]

Which share some elements, but not others. I have a methodology for this, but it relies on nested loops and if statements. What I've tried: A working, but inefficient example with 3 resulting vectors of 6 elements:

import numpy as np

p = np.array((1,2,3,4,5,6,10,30,40,60,100,300,400,600))
genome = np.array((1, 0, 1, 1, 0, 1))

index = 0
for i in range(0,3):
    
    if i==0:
        pBase = p[0:genome.size]
        print(pBase)
    else:
        extra = np.zeros(genome.size)
        for j in range(0,genome.size):
            if genome[j]==True:
                extra[j] = p[genome.size index]
                index  = 1
        pSplit = np.where(genome==False, pBase, extra)
        print(pSplit)

returns (as expected):

[1 2 3 4 5 6]
[10.  2. 30. 40.  5. 60.]
[100.   2. 300. 400.   5. 600.]

taking 45.1 µs ± 2.4 µs per loop. This seems unnecessarily verbose and slow for what should hypothetically be an easy operation, but I don't know any alternative methods. Is there some combination of list comprehensions or alternative functions that can accomplish the same results in a faster and more pythonic fashion?

EDIT: The values of V will not always be as simple as V10^i, the given vector is just for a demonstration. The values could be considered arbitrary (Generated from another method, following no replicable pattern like 10^i).

CodePudding user response：

This program is working in another way, in order to also support vectors not having powers of 10. It first generates the base in vectors, and then adds as many vectors as needed. The vectors are generated in the following way: If the place in the boolean vector is 1, then it takes a new element from rest, which contains all the elements that aren't used yet, and updates rest. If not, the value in the boolean vector is zero, and thus the program takes the value from vectors[0][i], which is the same as taking it from V.

V=[1,2,3,4,5,6,10,30,40,60,100,300,400,600]
boolean=[1,0,1,1,0,1]
vectors=[V[:len(boolean)]]
rest=V[len(boolean):]
while len(rest)>=sum(boolean):# no more vectors constructable
    newv=[]
    for i,x in enumerate(boolean):
        
        if x==1:
            newv.append(rest[0])
            rest=rest[1:]
        else:
            newv.append(vectors[0][i])
    vectors.append(newv)

CodePudding user response：

If I understand the question correctly, I the following performs the task in a cleaner manner, but you can let me know what you think.

def convert_vectors(master_vector, boolean_vector):
    """
    example:
    master_vector = [1,2,3,4,5,6,10,30,40,60,100,300,400,600]
    boolean_vector = [1,0,1,1,0,1]
    result = [[1, 2, 3, 4, 5, 6],[10, 2, 30, 40, 5, 60],[100, 2, 300, 400, 5, 600]]
    """
    res = []  # result
    curIndexInMaster = 0  # index in master_vector
    while curIndexInMaster < len(master_vector):
        curArray = []  # current array
        for bool in boolean_vector:  # for each element in boolean_vector
            if bool:  # should get new element from master_vector
                curArray.append(master_vector[curIndexInMaster])
                curIndexInMaster  = 1
            else:
                curArray.append(master_vector[len(curArray)])
                if curIndexInMaster < len(boolean_vector):  # only for first array
                    curIndexInMaster  = 1
        res.append(curArray)
    return res


master_vector = [1, 2, 3, 4, 5, 6, 10, 30, 40, 60, 100, 300, 400, 600]
boolean_vector = [1, 0, 1, 1, 0, 1]
print(convert_vectors(master_vector, boolean_vector))

Output:

[[1, 2, 3, 4, 5, 6], [10, 2, 30, 40, 5, 60], [100, 2, 300, 400, 5, 600]]

CodePudding user response：

Try the following code:

gs1 = genome.size             # Number of elements
gs2 = genome.sum()            # Number of "True" values
idx_p = np.r_[gs1 : p.size   1 : gs2]  # Starting indices in "p"
idx_g = np.where(genome)[0]   # Indices of "true" in "genome"
res = np.tile(p[0:gs1], idx_p.size).reshape(-1, gs1)  # result
iStart = gs1
iRow = 1          # Row number in "res"
for iEnd in idx_p[1:]:
    np.put(res[iRow], idx_g, p[iStart : iEnd])
    iRow  = 1
    iStart = iEnd

It contains some intelligent "tricks", like np.r_ to get indices of source "sections" and np.put to put these sections in consecutive rows of res, at correct indices.

For detais see documentation of respective Numpy methods.

For your source data the result (res array) is:

[[  1   2   3   4   5   6]
 [ 10   2  30  40   5  60]
 [100   2 300 400   5 600]]

CodePudding user response：

You can try this:

import numpy as np

V = np.array((1, 2, 3, 4, 5, 6, 10, 30, 40, 60, 100, 300, 400, 600))
b = np.array([1, 0, 1, 1, 0, 1]).astype(bool)

nc = len(b)
nr = (len(V) - len(b)) // b.sum()   1
out = np.tile(V[:nc], reps=(nr, 1))
out[1:][np.tile(b, reps=(nr - 1, 1))] = V[nc:]
print(out)

It gives:

[[  1   2   3   4   5   6]
 [ 10   2  30  40   5  60]
 [100   2 300 400   5 600]]