Home > database >  A faster way of building a Pandas.Dataframe from list of dictionaries than loop? [Python 3.9]
A faster way of building a Pandas.Dataframe from list of dictionaries than loop? [Python 3.9]

Time:08-24

I have a list of 5000 dictionaries where each dictionary has around 40 items, I have built a for loop that is extreeeemly slow - it needs a couple of minutes.

        # symbol_list_final is the list of dictionaries
        symbols_dataframe = pd.DataFrame([symbols_list_final[0]])

        for i in range(len(symbols_list_final) - 1):
             symbol_df_temp = pd.DataFrame([symbols_list_final[i   1]])
             symbols_dataframe = pd.concat((symbols_dataframe, symbol_df_temp), axis=1)
             print(i)

Is there any way of doing it faster?

EDIT: It's way slower, My program is running rn, and it takes 1 seconds to make 4-5 iterations.

CodePudding user response:

It seen like you are trying to formulate multiple dict dataframes and concatenate then into a single variable, containing your end_df. Firstly, the correct approach envolves not concatenating them all the time, only running such command once. So would recommend stacking the df objects on a list, and them concatenating

list_of_dfs = []
for i in list_dict:
    list_of_dfs.append(pd.DataFrame(i))

So pd.concat(list_of_dfs ) would be wise than redefining your variable all the time in your loop

Now if creating the df object is taking a while (give us the time). Well there are other ways of approaching this issue, such as the library pyarrow (which can be faster depending on your cpu).

  • Related