Home > other >  Comparing multiple numpy arrays of different lengths
Comparing multiple numpy arrays of different lengths

Time:05-11

I'm looking for the more pythonic / numpy way to solve this problem. This is incredibly inefficient and I am performing this operation over several million of values.

Inputs:

pos_all = np.array([1,2,3,4,5])
var_stat = np.array([True, True, False, True, True])
below_missing_threshold([False, False, True, False])

pos_all contains an array of values. The status of values in pos_all are contained in var_stat. If var_stat for the value is True, there is a corresponding boolean for that value in the below_missing_threshold array. If var_stat is False, there is no corresponding value in below_missing_threshold but we can assign it a value of True.

This is my current solution:

accessible = []
    for i in range(0, len(pos_all)):
        if len(pos_all) != len(var_stat):
            sys.exit("Warning: All Position array does not match variant status array")
        if len(pos) != len(below_missing_threshold):
            sys.exit("Warning: Varaint Postion array does not match filter status array")
        else:
            site = pos_all[i]
            is_var = var_stat[i]
            if is_var == True:
                #Find the index of the variant
                var_i = np.where(pos == site)[0][0]
                #Check if it passes filter
                if below_missing_threshold[var_i] == False: 
                    accessible.append(False)
                else:
                    accessible.append(True)
            else:
                accessible.append(True)

The function returns:

is_acessible = [False, False, True, True, False]

CodePudding user response:

IIUC, you can do it by creating a True array of the right size with np.ones_like. then use var_stat to select the values that have to be reassigned by below_missing_threshold

is_accessible = np.ones_like(pos_all, dtype=bool)
is_accessible[var_stat] = below_missing_threshold
print(is_accessible)
# [False False  True  True False]

CodePudding user response:

Yet another way of doing it:

above_thresh_idx = np.where(var_stat==False)
var_stat[np.where(var_stat==True)]=below_missing_threshold
var_stat[above_thresh_idx] = True

CodePudding user response:

IIUC, you can do this using np.insert (however indexing is the best choice) by:

F = np.where(var_stat == False)[0]
F_mod = np.insert(below_missing_threshold, F - np.arange(F.size), False)
result = var_stat == F_mod
# [False False  True  True False]

or as an another alternative way in one-line:

result = var_stat == np.insert(below_missing_threshold, np.flatnonzero(var_stat==0)
                                           -  np.count_nonzero(var_stat==0), False)
  • Related