Say I have a list of float like this one:
00000.0001
00000.0002
00000.0003
00000.0004
00001.0009
12345.0015
12345.0016
12345.0017
12345.0018
12345.0019
12346.0010
54321.0021
54321.0022
54321.0023
My goal is to sum the first and last elements of a each group within a range, in my case if it's plus 1 maximum, for example:
00000.0001 00001.0009
12345.0015 12346.0010
54321.0021 54321.0023
I'm trying to iterate the array using zip and range based on the length of the array, storing the first element of the group first, but cannot quite nail the solution.
I don't know much abound pandas, but would that help? Any other tip would be appreciated, don't necessarily need the exact solution.
CodePudding user response:
Just iterate over the list and keep track of how often you cross the threshold. Keep track of the value when you last crossed it and add them together.
things = [1,2,3,4,10,11,12,21,22,23]
start = things[0]
threshold = 10
totals=[]
for i, n in enumerate(things):
if n - start >= threshold:
totals.append(start things[i-1])
start = n
if start != n:
totals.append(start n)
else:
totals.append(n)
print(totals)
assert sum(totals) == (1 10) (11 12) (21 23)
CodePudding user response:
You can use zip to identify the elements that are at more than 1.0 (threshold) away from their predecessor. Applying a cumulative sum on these breaks will produce group identifiers that can then be used with groupby():
L = [0.0001,
0.0002,
0.0003,
0.0004,
1.0009,
12345.0015,
12345.0016,
12345.0017,
12345.0018,
12345.0019,
12346.0010,
54321.0021,
54321.0022,
54321.0023]
from itertools import groupby,accumulate
threshold = 1.0
groups = accumulate(b-a>threshold for a,b in zip(L[:1] L,L))
result = [ (g[0],g[-1]) for g,g[:] in groupby(L,lambda _:[next(groups)]) ]
print(result)
[(0.0001, 0.0004),
(1.0009, 1.0009), # 1.0009 - 0.0004 = 1.0005 > 1.0
(12345.0015, 12346.001),
(54321.0021, 54321.0023)]