I have two functions that I want to run concurrently to check performance, now a days I'm running one after another and it's taking quite some time.
Here it's how I'm running
import pandas as pd
import threading
df = pd.read_csv('data/Detalhado_full.csv', sep=',', dtype={'maquina':str})
def gerar_graph_36():
df_ordered = df.query(f'maquina=="3.6"')[['data', 'dia_semana', 'oee', 'ptg_ruins', 'prod_real_kg', 'prod_teorica_kg']].sort_values(by='data')
oee = df_ordered['oee'].iloc[-1:].iloc[0]
return oee
def gerar_graph_31():
df_ordered = df.query(f'maquina=="3.1"')[['data', 'dia_semana', 'oee', 'ptg_ruins', 'prod_real_kg', 'prod_teorica_kg']].sort_values(by='data')
oee = df_ordered['oee'].iloc[-1:].iloc[0]
return oee
oee_36 = gerar_graph_36()
oee_31 = gerar_graph_31()
print(oee_36, oee_31)
I tried to apply threading using this statement but it's not returning the variable, instead it's printing None value
print(oee_31, oee_36) -> Expecting: 106.3 99.7 // Returning None None
oee_31 = threading.Thread(target=gerar_graph_31, args=()).start()
oee_36 = threading.Thread(target=gerar_graph_36, args=()).start()
print(oee_31, oee_36)
For checking purpose, If I use the command below, returns 3 as expected
print(threading.active_count())
I need the return oee value from the function, something like 103.8.
Thanks in advance!!
CodePudding user response:
Ordinarily creatign a new thread and starting it is not like calling a function which returns a variable: the Thread.start() call just "starts the code of the other thread", and returns imediatelly.
To colect results in the other threads you have to comunicate the computed results to the main thread using some data structure. An ordinary list or dictionary could do, or one could use a queue.Queue
.
If you want to have something more like a function call and be able to not modify the gerar_graph()
functions, you could use the concurrent.futures
module instead of threading: that is higher level code that will wrap your calls in a "future" object, and you will be able to check when each future is done and fetch the value returned by the function.
Otherwise, simply have a top-level variable containign a list, wait for your threads to finish up running (they stop when the function called by "target" returns), and collect the results:
import pandas as pd
import threading
df = pd.read_csv('data/Detalhado_full.csv', sep=',', dtype={'maquina':str})
results = []
def gerar_graph_36():
df_ordered = df.query(f'maquina=="3.6"')[['data', 'dia_semana', 'oee', 'ptg_ruins', 'prod_real_kg', 'prod_teorica_kg']].sort_values(by='data')
oee = df_ordered['oee'].iloc[-1:].iloc[0]
results.append(oee)
def gerar_graph_31():
df_ordered = df.query(f'maquina=="3.1"')[['data', 'dia_semana', 'oee', 'ptg_ruins', 'prod_real_kg', 'prod_teorica_kg']].sort_values(by='data')
oee = df_ordered['oee'].iloc[-1:].iloc[0]
results.append(oee)
# We need to keep a reference to the threads themselves
# so that we can call both ".start()" (which always returns None)
# and ".join()" on them.
oee_31 = threading.Thread(target=gerar_graph_31); oee_31.start()
oee_36 = threading.Thread(target=gerar_graph_36); oee_36.start()
oee_31.join() # will block and return only when the task is done, but oee_36 will be running concurrently
oee_36.join()
print(results)
If you need more than 2 threads, (like all 36...), I strongly suggest using concurrent.futures: you can limit the number of workers to a number comparable to the logical CPUs you have. And, of course, manage your tasks and calls in a list or dictionary, instead of having a separate variable name for each.