How do to speed up ordinary dataframe loop in python? vectorisation? multiprocess?-CodePudding

I have a simple piece of code. Essentially, I want to speed up my loop that creates a dataframe using dataframes. I haven't found an example and would appreciate anyones help.

df_new = []

for df_i in df:
   df_selected = df[df['good_value'] == df_i_list]
   df_new = pd.concat([df_new,df_selected])

CodePudding user response：

Given your code does not work, this is the best I can come up with.

Start with a list of dataframes, then select the rows in your dataframes to another list and then concat in one step.

Since concat is the heavy operation, this makes sure you call it only once, which is how it's meant to be used.

import pandas as pd

dfs = [df1, df2, df3, df4, ...]

sel = [df[df['column_to_filter'] == 'good_value'] for df in dfs]

df_new = pd.concat(sel)  # might be useful to add `ignore_index=True`

CodePudding user response：

df_new = df[df['good_value'].isin(df_i_list)]

The pd.concat is 4x slower than .isin()