Home > front end >  Snowpark-Python Dynamic Join
Snowpark-Python Dynamic Join

Time:01-28

I have searched through a large amount of documentation to try to find an example of what I'm trying to do. I admit that the bigger issue may be my lack of python expertise. So i'm reaching out here in hopes that someone can point me in the right direction. I am trying to create a python function that dynamically queries tables based on a function parameters. Here is an example of what i'm trying to do:

def validateData(_ses, table_name,sel_col,join_col, data_state, validation_state):
 
    sdf_t1 = _ses.table(table_name).select(sel_col).filter(col('state') == data_state)
    sdf_t2 = _ses.table(table_name).select(sel_col).filter(col('state') == validation_state)
    df_join = sdf_t1.join(sdf_t2, [sdf_t1[i] == sdf_t2[i] for i in join_col],'full')
    return df_join.to_pandas()

This would be called like this:

df = validateData(ses,'table_name',[col('c1'),col('c2')],[col('c2'),col('c3')],'AZ','TX')

this issue i'm having is with line 5 from the funtion:

df_join = sdf_t1.join(sdf_t2, [col(sdf_t1[i]) == col(sdf_t2[i]) for i in join_col],'full')

I know that code is incorrect, but I'm hoping it explains what i'm trying to do. If anyone has any advice on if this is possible or how, I would greatly appreciate it.

CodePudding user response:

Instead of joining in data frame, i think its easier to use a direct SQL and pull the data in a snow frame and convert it to a pandas data frame.

from snowflake.snowpark import Session
import pandas as pd

#snow df creation using SQL
data = session.sql("select t1.col1, t2.col2, t2.col2 from mytable t1 full outer join mytable2 t2 on t1.id=t2.id where t1.col3='something'")

#Convert snow DF to Pandas DF. You can use this pandas data frame.
data= pd.DataFrame(data.collect())

CodePudding user response:

Essentially what you need is to create a python expression from two lists of variables. I don't have a better idea than using eval.

Maybe try eval(" & ".join(["(col(sdf_t1[i]) == col(sdf_t2[i]))" for i in join_col]). Be mindful that I have not completely test this but just to toss an idea.

  • Related