How to increase looping performance-CodePudding

I'm working on a script that checks if time slices overlap or not.

I have a handler function that looks like this:

def intersection_checker(foo, bar):
    if foo == bar:
       return True
    if foo[0] == bar[1] or foo[1] == bar[0]:
       return True
    if bar[0] < (foo[0] or foo[1]) < bar[1]:
       return True
    if foo[0] < (bar[0] or bar[1]) < foo[1]:
       return True
    return False

Object foo is a tuple of two datetime.time() objects:

foo = (datetime.strptime('06:30:00','%H:%M:%S').time(), datetime.strptime('08:15:00','%H:%M:%S').time())

Object bar is a set() of foo-like datetime.time() objects. That set can include 200 k of that objects.

The line that calls the handler (intersection_checker) looks like this:

...
if len(bar) > 1 and True in set(map(intersection_checker, repeat(foo), bar)):
...

This code works. The problem is that it takes centuries to process such a large amount of data. I tried using a for loop to iterate trough function, but that didn't work as well as using the built-in map. Perhaps there is a way to transfer and process large amounts of data more efficiently? Or check intersections in a different way? And yes, it is enough to only find the first True value, it is not necessary to loop through the entire bar.

CodePudding user response：

You'll likely get a boost in performance by replacing this:

True in set(map(intersection_checker, repeat(foo), bar))

with this:

any(map(intersection_checker, repeat(foo), bar))

By converting to a set first, you're forcing the entire dataset to be mapped before it can determine if any of the values are True. Using any() will stop the map iterator as soon as a True value is found.

CodePudding user response：

Time objects support comparisons, and therefor you can do:

def overlap(t1,t2):
    # Check if two time ranges overlap
    # Pass two tuples each with the start and end datetime defining the range
    return False if t1[1]<t2[0] or t1[0]>t2[1] else True

Test this:

t1= (dt.datetime.strptime('06:30:00','%H:%M:%S').time(), dt.datetime.strptime('08:30:00','%H:%M:%S').time())
t2= (dt.datetime.strptime('08:29:00','%H:%M:%S').time(), dt.datetime.strptime('09:15:00','%H:%M:%S').time())
t3= (dt.datetime.strptime('09:47:00','%H:%M:%S').time(), dt.datetime.strptime('09:33:00','%H:%M:%S').time())

>>> overlap(t1,t2)
True
>>> overlap(t1,t3)
False

Then use next on an iterator or any to break on the first True.

any(overlap(foo,x) for x in your_sequence)

Or,

next(((i,f'{x} overlaps {foo}') for i,x in enumerate(your_seq) if overlap(foo,x)==True), (-1, 'no overlaps found'))