I am writing a python function that will do a leftanti join on two dataframe, and the joining condition may vary. i.e. sometime 2 DFs might have just one column as unique key for joining, and soemtime 2 DFs might have more than 1 columns to join on.
So, I have written the below code. Please suggest what changes should I make
def integraty_check(testdata, refdata, cond = []):
df = func.join_dataframe(testdata, refdata, cond, "leftanti", logger)
df = df.select(cond)
func.write_df_as_parquet_file(df, curate_path, logger)
return df
here the parameter cond
may have 1 or more than 1 column names as comma separated.
So, hwo do I pass the dynamic list of column names when I am calling the function?
Please suggest what would be the best way to achieve the purpose.
CodePudding user response:
you can use python's Unpacking Operator (PEP 448)
df = df.select(*cond)
You can find more examples on how to use the asterisk operator: Packing and Unpacking Arguments in Python