I'm looking for the more pythonic / numpy way to solve this problem. This is incredibly inefficient and I am performing this operation over several million of values.
Inputs:
pos_all = np.array([1,2,3,4,5])
var_stat = np.array([True, True, False, True, True])
below_missing_threshold([False, False, True, False])
pos_all
contains an array of values. The status of values in pos_all
are contained in var_stat
. If var_stat
for the value is True
, there is a corresponding boolean for that value in the below_missing_threshold
array. If var_stat
is False
, there is no corresponding value in below_missing_threshold
but we can assign it a value of True
.
This is my current solution:
accessible = []
for i in range(0, len(pos_all)):
if len(pos_all) != len(var_stat):
sys.exit("Warning: All Position array does not match variant status array")
if len(pos) != len(below_missing_threshold):
sys.exit("Warning: Varaint Postion array does not match filter status array")
else:
site = pos_all[i]
is_var = var_stat[i]
if is_var == True:
#Find the index of the variant
var_i = np.where(pos == site)[0][0]
#Check if it passes filter
if below_missing_threshold[var_i] == False:
accessible.append(False)
else:
accessible.append(True)
else:
accessible.append(True)
The function returns:
is_acessible = [False, False, True, True, False]
CodePudding user response:
IIUC, you can do it by creating a True array of the right size with np.ones_like
. then use var_stat
to select the values that have to be reassigned by below_missing_threshold
is_accessible = np.ones_like(pos_all, dtype=bool)
is_accessible[var_stat] = below_missing_threshold
print(is_accessible)
# [False False True True False]
CodePudding user response:
Yet another way of doing it:
above_thresh_idx = np.where(var_stat==False)
var_stat[np.where(var_stat==True)]=below_missing_threshold
var_stat[above_thresh_idx] = True
CodePudding user response:
IIUC, you can do this using np.insert
(however indexing is the best choice) by:
F = np.where(var_stat == False)[0]
F_mod = np.insert(below_missing_threshold, F - np.arange(F.size), False)
result = var_stat == F_mod
# [False False True True False]
or as an another alternative way in one-line:
result = var_stat == np.insert(below_missing_threshold, np.flatnonzero(var_stat==0)
- np.count_nonzero(var_stat==0), False)