Allow a function (section of a code) to run only for 30 seconds, if not finished and returned the va-CodePudding

In my code, I have a function that has an integer as an input. This input affects heavily the running time of this function. This is the code line where I call the function, with the input value of 0.35.

frequent_itemsets = get_frequent_items(0.35)

The get_frequent_items function returns a DataFrame, and next in the code I using this DataFrame for other computations, so I need this method to return the DataFrame (here called frequent_itemsets) to be able to continue the code.

Knowing that the input value of the integer of the function (here 0.35 ) in the example, affects the running time heavily, (for example if it is 0.35 the functions takes 28 seconds to return , and if it is 0.3, the function will take 2 Hours to return).

I am thinking of limiting the input options values for the function to the options

var_support_options = [0.18, 0.2, 0.25, 0.3, 0.35]

Now, my questions is, is there a way to write the code in a way that it try the function using these input values (provided in var_support_options) list, starting from the lowest value to the biggest.

EXAMPLE OF DESIRED process:

iteration 1 : frequent_itemsets = get_frequent_items(0.18)

if this iteration takes more than 30 seconds, stop the iteration and try the next value in the input list (in the example 0.2).
else if this takes less than 30 seconds, return the frequent_itemsets dataframe and continue the code.

I want the function to be done in less than 30 seconds using the least input integer value and then return the result and continue to the next lines of code.

Should I do that using multithreading, multiprocessing or other ? And how the code should be.

CodePudding user response：

You can use multiprocessing to run code because it has method to kill/terminate it. But it needs queue to send result back to main process (because processes don't share memory and they can't use global variable)

And main process would run loop which periodically check if there is result in queue and if it time to kill/terminale other process.

One problem is that in multiprocessing processes don't share memory so main process has to send data to processes and it uses file created with pickle - so for big file it may need extra time.

Minimal working example.

It is similar to examples in answers suggested by @matszwecja
How to limit execution time of a function call?

import multiprocessing
import time

def get_frequent_items(queue, value):
    # simulater work with differen time
    time.sleep(10-(value*10))  

    # send result
    queue.put(value*2)

def run(value, timeout=30):
    
    # qeueu to get result
    q = multiprocessing.Queue()

    # start process
    p = multiprocessing.Process(target=get_frequent_items, args=(q, value))
    p.start()

    start = time.time()

    while True:

        time.sleep(0.1)  # reduce CPU consumption

        end = time.time()
        
        print(f'time: {end-start:.1f}', end='\r')

        if q.empty():                # check if there is result in queue
            if end-start > timeout:  # check if it is time to kill process
                p.terminate()
                return None          # return None when there is no result
        else:
            return q.get()           # return result
        
# ---- main ---

if __name__ == '__main__':
    
    for var_support_options in [0.18, 0.2, 0.25, 0.3, 0.35]:

        result = run(var_support_options, timeout=7)

        print('result:', result, 'for', var_support_options)

        # exit loop when you get first result
        if result:
            break
        
    # --- after loop ---

    if result:
        print('final result:', result, 'for', var_support_options)
    else:
        print('no result')