I have an issue where I'm trying to perform certain tasks in something called Apache Airflow.
Thing is the memory is very limited, and performing this pandas line causes it to evict the task and categorise it as using too much memory.
Is there any way to do this another way without using as much memory with these 2 dataframes?
arct_df = arct_df[~arct_df.im_uuid.isin(dadge_df.im_uuid)]
Sample of arct_df
and assume dadge_df
has the same columns just different data in the rows.
CodePudding user response:
You shouldn't use Airflow as a data processing framework, that operation will most likely run better on a database if you have the chance.
See Airflow best practices.