I have a list of random data and a threshold:
threshold = 3
data = [2,2,2,2,2,5,5,2,2,2,2,3,4,5,6,4,5,4,3,4,5,3,3,7,8,2,2,2] # data
timestamp =[]
for i in range(len(data)):
timestamp.append(i)
print(timestamp)
I am trying to extract timestamps that are below the threshold but, if a range of consecutive timestamps (less than 4 timestamps (<4)), between 2 time ranges below the threshold occurs, we also treat it as below the threshold
As such, this example should return:
belowthreshold = [0,1,2,3,4,5,6,7,8,9,10,25,26,27]
So we can see that the consecutive 5,5
is skipped and treated as under threshold since values before and after it are under threshold
Currently, my method is:
belowthreshold = []
for j in range(len(data)):
if data[j] < threshold and data[j]: # check if greater than threshold, meaning energy is being used at home
belowthreshold.append(j) # add this time to a list
However it quite clearly only extracts values less than the threshold.
What is the best way to approach this?
Thanks in advance for your answers
CodePudding user response:
Try with list comprehension using itertools.zip_longest
:
import itertools
output = [i for i, (x, y, z) in enumerate(itertools.zip_longest(data,data[1:],data[2:],fillvalue=0)) if x<threshold or y<threshold or z<threshold]
>>> output
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 23, 24, 25, 26, 27]
Edit:
To take into account that the "consecutive" timestamps can be on either side, you can use itertools.groupby
using a custom key to check if the value is less than the threshold.
This splits the data into the following groups: [2, 2, 2, 2, 2]
, [5, 5]
, [2, 2, 2, 2]
, [3, 4, 5, 6, 4, 5, 4, 3, 4, 5, 3, 3, 7, 8]
, [2, 2, 2]
output = list()
i = 0
for k, v in itertools.groupby(data, key=lambda x: x<threshold):
values = list(v)
if k or len(values) < 4:
output = [i x for x in range(len(values))]
i = len(values)
>>> output
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 25, 26, 27]
CodePudding user response:
You can use a temporary list to store steps when they're above threshold and you can add them to the result list if they remained above only 3 or less steps, otherwise you reset this temporary list. Here we go:
threshold = 3
data = [2,2,2,2,2,5,5,2,2,2,2,3,4,5,6,4,5,4,3,4,5,3,3,7,8,2,2,2] # data
steps =[] # 'time stamp'
for i in range(len(data)):
steps.append(i)
print(steps)
belowthreshold = []
temp_above_threshold = []
consecutive_above_counter = 0
for j in range(len(data)):
if data[j] < threshold:
if consecutive_above_counter < 4: # add only if less than 4 steps were above threshold
belowthreshold = belowthreshold temp_above_threshold
# reset counter and temporary list
consecutive_above_counter = 0
temp_above_threshold = []
belowthreshold.append(j) # add this time to a list
else:
consecutive_above_counter = 1
if consecutive_above_counter < 4:
temp_above_threshold.append(j)
else:
temp_above_threshold = []
print(belowthreshold)
edit: I tried to bring a simple solution following your code, without adding extra packages complexity that might be difficult to keep track later.
CodePudding user response:
I have managed to replicate your output with the following code:
def below_threshold(threshold, list_of_value):
indices = set()
for i in range(2, len(list_of_value)):
if all(list_of_value[k] >= threshold for k in [i, i - 1, i - 2]):
indices = indices.union({i, i-1, i-2})
return set(range(len(list_of_value))).difference(indices)
print(below_threshold(3, [2, 2, 2, 2, 2, 5, 5, 2, 2, 2, 2, 3, 4, 5, 6, 4, 5, 4, 3, 4, 5, 3, 3, 7, 8, 2, 2, 2]))