My script is as follows
import pandas as pd
df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3']})
def make_df(year):
df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], str(year): [str(year), str(year 1), str(year 2), str(year 3)]})
return df
for year in range(2020, 2015, -1):
df = pd.merge(df, make_df(year), on=['key'], how='left')
The final df will be..
key A 2020 2019 2018 2017 2016
0 K0 A0 2020 2019 2018 2017 2016
1 K1 A1 2021 2020 2019 2018 2017
2 K2 A2 2022 2021 2020 2019 2018
3 K3 A3 2023 2022 2021 2020 2019
my actual make_new_df(year)
is much more complex and takes too much time.
How can I paralleize the for-loop for year in range(2020, 2015, -1):
and shorten processing time?
CodePudding user response:
I'm not sure you need threading in this case. You can create a list of DataFrames and join them all at once, getting rid of your function:
import pandas as pd
df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3']})
nb_row = len(df.index)
df_list = [pd.DataFrame(range(year, year nb_row), columns=[str(year)]) for year in range(2020, 2015, -1)]
df = df.join(df_list)
Edit: keeping make_df:
def make_df(year):
df = pd.DataFrame({str(year): [str(year), str(year 1), str(year 2), str(year 3)]})
return df
df_list = [make_df(year) for year in range(2020, 2015, -1)]
df = df.join(df_list)
CodePudding user response:
After reading your comments it seems that you want a thread for each year:
import threading
import pandas as pd
df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3']})
year_range = range(2020, 2015, -1)
df_threads = []
df_list = []
def make_df(year):
df = pd.DataFrame({str(year): [str(year), str(year 1), str(year 2), str(year 3)]})
return df_list.append(df)
for year in year_range:
df_threads.append(threading.Thread(target=make_df, args=[year]))
df_threads[2020 - year].start()
# waiting for threads:
for t in df_threads:
t.join()
df = df.join(df_list)
print(df)