In the process of converting some SAS code to PySpark and we previously used a macro variable for the where statement in this code. In adapting to PySpark, I'm trying to pass a list of dates to the where statement, but I keep getting errors. I want the SQL code to pull all data from those 3 months. Any pointers?
month_list = ['202107', '202108', '202109']
sql_query = """ (SELECT *
FROM Table_Blah
WHERE (to_char(DateVariable,'yyyymm') IN '{}')
) as table1""".format(month_list)
CodePudding user response:
Pass the list as a tuple to have the right sql syntax:
month_list = ['202107', '202108', '202109']
sql_query = """ (SELECT *
FROM Table_Blah
WHERE (to_char(DateVariable,'yyyymm') IN {})
) as table1""".format(tuple(month_list))
And you don’t need apostrophe for in statement