I have lists of data points, which I look at to see if they are above a certain threshold.
I can calculate the percentage of total points above the threshold, but I need index and points of all points above the threshold. e.g.
points_above_threshold = [1,1,1,0,0,0,1,1] 1 is yes, 0 is no
I need a function which returns, points in the format: [line_points,[start_index, end_index]
e.g. the output of points_above_threshold would be [3,(0,2)],[2,(6,7)]
CodePudding user response:
Your question is lacking some detail about the format of the data you're working with. A good starting point is to specify precisely the expected input and output for your function.
For example, if your data is a list of numbers (floats) like this:
[1.56, 2.45, 8.43, ... ]
your threshold is a single floating point number, and your output is expected to be a list of tuples (index, data_point) like this:
[(1, 2.45), (2, 8.43), ... ]
Then you can write a function that that looks something like this:
def get_points_above_threshold(data_list, threshold):
output = []
for idx, point in enumerate(data_list):
if point > threshold:
output.append((idx, point))
return output
I'll attempt to answer how to implement the points_above_threshold
function you describe. We can alter the above function slightly with a tracking system to calculate the index ranges of values that are above the threshold like this:
def compute_ranges(values, threshold):
start_range = None #
ranges = [] # tuples (start_idx, end_idx), inclusive
for idx, value in enumerate(values): #
if value <= threshold: # This either ends an "active" range, or does nothing if there isn't one.
if start_range is None: # If no current range, continue
continue #
ranges.append((start_range, idx-1)) # Otherwise end current range, append it to ranges, and reset range variables
start_range = None #
else: # Otherwise, we either start an "active" range or continue one that already exists
if start_range is None: #
start_range = idx #
if start_range is not None: # If still an active range, append it (since range could end at end of list)
ranges.append((start_range, #
len(values)-1)) #
final = [(r[1]-r[0] 1, r) for r in ranges] # Do final convert that includes length of range to output
return final
If we apply this function to a list of numbers with a given threshold, it will output the ranges in the way you describe above. For example, if the input list is the simple example
[1,1,1,0,0,0,1,1]
and the threshold is say 0.5
, then the output is
[(3, (0, 2)), (2, (6, 7))]
CodePudding user response:
Using enumerate
and pairwise iteration
we can achieve what you want.
# enumerate helps us to isolate the indexes of 1's
points_above_threshold = [1,1,1,0,0,0,1,1]
id_ = [i for i,e in enumerate(a) if e == 1] # list comprehension
print(id_)
[0, 1, 2, 6, 7] # all indexes of 1's
# pairwise iteration helps us find the
# sequences of indexes, e.g. (0,1,2) and (6,7) are sequences
pairwise = [[]]
for item1, item2 in list(zip(id_, id_[1:])):
if item2-item1 == 1:
if not pairwise[-1]:
pairwise[-1].extend((item1,item2))
else:
pairwise[-1].append(item2)
elif pairwise[-1]:
pairwise.append([])
print(pairwise)
[[0, 1, 2], [6, 7]]
# with the code above we've just iterate over the id_ list
# and create another list with the sequences nested
# now using list comprehension we can achieve the output,
# but with tuples nested inside a list
points_above_threshold = [(len(i), (i[0], i[-1])) for i in pairwise]
print(points_above_threshold)
[(3, (0, 2)), (2, (6, 7))]
Hope this is helpful!