I have searched through a large amount of documentation to try to find an example of what I'm trying to do. I admit that the bigger issue may be my lack of python expertise. So i'm reaching out here in hopes that someone can point me in the right direction. I am trying to create a python function that dynamically queries tables based on a function parameters. Here is an example of what i'm trying to do:
def validateData(_ses, table_name,sel_col,join_col, data_state, validation_state):
sdf_t1 = _ses.table(table_name).select(sel_col).filter(col('state') == data_state)
sdf_t2 = _ses.table(table_name).select(sel_col).filter(col('state') == validation_state)
df_join = sdf_t1.join(sdf_t2, [sdf_t1[i] == sdf_t2[i] for i in join_col],'full')
return df_join.to_pandas()
This would be called like this:
df = validateData(ses,'table_name',[col('c1'),col('c2')],[col('c2'),col('c3')],'AZ','TX')
this issue i'm having is with line 5 from the funtion:
df_join = sdf_t1.join(sdf_t2, [col(sdf_t1[i]) == col(sdf_t2[i]) for i in join_col],'full')
I know that code is incorrect, but I'm hoping it explains what i'm trying to do. If anyone has any advice on if this is possible or how, I would greatly appreciate it.
CodePudding user response:
Instead of joining in data frame, i think its easier to use a direct SQL and pull the data in a snow frame and convert it to a pandas data frame.
from snowflake.snowpark import Session
import pandas as pd
#snow df creation using SQL
data = session.sql("select t1.col1, t2.col2, t2.col2 from mytable t1 full outer join mytable2 t2 on t1.id=t2.id where t1.col3='something'")
#Convert snow DF to Pandas DF. You can use this pandas data frame.
data= pd.DataFrame(data.collect())
CodePudding user response:
Essentially what you need is to create a python expression from two lists of variables. I don't have a better idea than using eval.
Maybe try eval(" & ".join(["(col(sdf_t1[i]) == col(sdf_t2[i]))" for i in join_col])
. Be mindful that I have not completely test this but just to toss an idea.