I have a list of 5000 dictionaries where each dictionary has around 40 items, I have built a for
loop that is extreeeemly slow - it needs a couple of minutes.
# symbol_list_final is the list of dictionaries
symbols_dataframe = pd.DataFrame([symbols_list_final[0]])
for i in range(len(symbols_list_final) - 1):
symbol_df_temp = pd.DataFrame([symbols_list_final[i 1]])
symbols_dataframe = pd.concat((symbols_dataframe, symbol_df_temp), axis=1)
print(i)
Is there any way of doing it faster?
EDIT: It's way slower, My program is running rn, and it takes 1 seconds to make 4-5 iterations.
CodePudding user response:
It seen like you are trying to formulate multiple dict dataframes and concatenate then into a single variable, containing your end_df. Firstly, the correct approach envolves not concatenating them all the time, only running such command once. So would recommend stacking the df objects on a list, and them concatenating
list_of_dfs = []
for i in list_dict:
list_of_dfs.append(pd.DataFrame(i))
So pd.concat(list_of_dfs )
would be wise than redefining your variable all the time in your loop
Now if creating the df object is taking a while (give us the time). Well there are other ways of approaching this issue, such as the library pyarrow (which can be faster depending on your cpu).