In my code, I have a function that has an integer as an input. This input affects heavily the running time of this function. This is the code line where I call the function, with the input value of 0.35.
frequent_itemsets = get_frequent_items(0.35)
The get_frequent_items function returns a DataFrame, and next in the code I using this DataFrame for other computations, so I need this method to return the DataFrame (here called frequent_itemsets) to be able to continue the code.
Knowing that the input value of the integer of the function (here 0.35 ) in the example, affects the running time heavily, (for example if it is 0.35 the functions takes 28 seconds to return , and if it is 0.3, the function will take 2 Hours to return).
I am thinking of limiting the input options values for the function to the options
var_support_options = [0.18, 0.2, 0.25, 0.3, 0.35]
Now, my questions is, is there a way to write the code in a way that it try the function using these input values (provided in var_support_options) list, starting from the lowest value to the biggest.
EXAMPLE OF DESIRED process:
iteration 1 : frequent_itemsets = get_frequent_items(0.18)
if this iteration takes more than 30 seconds, stop the iteration and try the next value in the input list (in the example
0.2
).else if this takes less than 30 seconds, return the
frequent_itemsets
dataframe and continue the code.
I want the function to be done in less than 30 seconds using the least input integer value and then return the result and continue to the next lines of code.
Should I do that using multithreading
, multiprocessing
or other ? And how the code should be.
CodePudding user response:
You can use multiprocessing to run code because it has method to kill/terminate it. But it needs queue
to send result back to main process (because processes don't share memory and they can't use global
variable)
And main process would run loop which periodically check if there is result in queue and if it time to kill/terminale other process.
One problem is that in multiprocessing
processes don't share memory so main process has to send data to processes and it uses file created with pickle
- so for big file it may need extra time.
Minimal working example.
It is similar to examples in answers suggested by
@matszwecja
How to limit execution time of a function call?
import multiprocessing
import time
def get_frequent_items(queue, value):
# simulater work with differen time
time.sleep(10-(value*10))
# send result
queue.put(value*2)
def run(value, timeout=30):
# qeueu to get result
q = multiprocessing.Queue()
# start process
p = multiprocessing.Process(target=get_frequent_items, args=(q, value))
p.start()
start = time.time()
while True:
time.sleep(0.1) # reduce CPU consumption
end = time.time()
print(f'time: {end-start:.1f}', end='\r')
if q.empty(): # check if there is result in queue
if end-start > timeout: # check if it is time to kill process
p.terminate()
return None # return None when there is no result
else:
return q.get() # return result
# ---- main ---
if __name__ == '__main__':
for var_support_options in [0.18, 0.2, 0.25, 0.3, 0.35]:
result = run(var_support_options, timeout=7)
print('result:', result, 'for', var_support_options)
# exit loop when you get first result
if result:
break
# --- after loop ---
if result:
print('final result:', result, 'for', var_support_options)
else:
print('no result')