Pythonic method to pull the start / end of specific intervals within a list?-CodePudding

I have an ordered list, for example:

 my_list = [0.1, 0.2, 0.3, 0.4, 2.6, 2.7, 2.8, 2.9,
            5.1, 5.2, 6.1, 6.2, 6.3, 7.1, 7.2, 7.3, 
            7.4, 7.5, 10.1, 10.2, 10.3, 10.4, 10.5]

I need intervals of numbers < 1s apart, where there are at least 3 numbers. I only want the start and end of the intervals. For example:

Output: [[0.1, 0.4], [2.6, 2.9], [5.1, 7.5], [10.1, 10.5]]

In[0]: print(start)
Output: [0.1, 2.6, 5.1, 10.1]
In[1]: print(end)
Output: [0.4, 2.9, 7.5, 10.5]

I've tried a variety of loops, but I'm having trouble getting only the start times appended to a new list, and also having trouble avoiding "Index out of range" when getting to the end of the list. Here is where I'm at currently:

    for i in range(0, (len(my_list)-2)):
        
        second = i   1
        third = i   2

        if ((my_list[third] - my_list[second])) < 1 and ((my_list[second] - my_list[i]) < 1):
            temp.append(my_list[i])
        else:
            end.append(my_list[second])
            start.append(temp[0])
            temp.clear()

My solution to get only the start of the interval is to append the items to a temporary list, append the first and last element and clear that list. I'm sure there is a more elegant way to do this, and the list can be thousands of rows so I don't think this is a very efficient method.

Any help would be much appreciated.

CodePudding user response：

Here's another method: numpy list comprehension:

import numpy as np
out = [[arr[0], arr[-1]] for arr in np.split(my_list, np.where(np.diff(my_list) > 1)[0]   1) if len(arr)>2]

Or if you don't want to use numpy, you can use 2 list comprehensions to find the same:

splits = [0]   [idx 1 for idx, (i,j) in enumerate(zip(my_list, my_list[1:])) if j-i > 1]   [len(my_list)]
out = [[my_list[start], my_list[end-1]] for start, end in zip(splits, splits[1:]) if end - start > 2]

Output:

[[0.1, 0.4], [2.6, 2.9], [5.1, 7.5], [10.1, 10.5]]

CodePudding user response：

Here's one way:

start = my_list[:1] [z[0] for z in zip(my_list[0:-1],my_list[1:]) if z[0] 1<z[1]]
end = [z[1] for z in zip(my_list[0:-1],my_list[1:]) if z[0] 1<z[1]]   my_list[-1:]

CodePudding user response：

you can use stack to keep track of all the values that follow the condition, once new value come, then empty stack , check it has length more than 2 and if yes then add the first and last element from stack in the resultant list.

Here is another way:

my_list = [0.1, 0.2, 0.3, 0.4, 2.6, 2.7, 2.8, 2.9,
            5.1, 5.2, 6.1, 6.2, 6.3, 7.1, 7.2, 7.3,
            7.4, 7.5, 10.1, 10.2, 10.3, 10.4, 10.5]

result = []
tmp =[]

for i, v in enumerate(my_list):
    if not tmp:
        tmp.append(v)
    else:
        if abs(tmp[-1]-v)<1:
            tmp.append(v)
        else:
            if len(tmp)>=3:
                result.append([tmp[0], tmp[-1]])

            tmp = [v]
if tmp and len(tmp)>=3:
    result.append([tmp[0], tmp[-1]])

print(result)
# output: [[0.1, 0.4], [2.6, 2.9], [5.1, 7.5], [10.1, 10.5]]

CodePudding user response：

Using Pandas:

arr = pd.Series(my_list)
arr = arr.groupby(arr.astype(int)).nth([0,-1])
result = list(zip(arr[::2], arr[1::2])))

Or without pandas you can use itertools.groupby using int as your key: (Note: this assumes the list is sorted)

from itertools import groupby

my_list = [0.1, 0.2, 0.3, 0.4, 2.6, 2.7, 2.8, 2.9,
            5.1, 5.2, 6.1, 6.2, 6.3, 7.1, 7.2, 7.3, 
            7.4, 7.5, 10.1, 10.2, 10.3, 10.4, 10.5]

result = []
for k, g in groupby(my_list, int):
    group = list(g)
    result.append([group[0], group[-1]])

Or as a comprehension:

result = [[f:=next(g), [f, *g][-1]] for k, g in groupby(my_list, int)]