Home > database >  PySpark: Adding elements from python list into spark.sql() statement
PySpark: Adding elements from python list into spark.sql() statement

Time:02-16

have list in python that is used throughout my code:

pylist = ['A', 'B', 'C', 'D']

I also have have a simple spark.sql() line that I need to execute:

query = spark.sql(
    """
    SELECT col1, col2, col3
    FROM database.table
    WHERE col3 IN ('A', 'B', 'C', 'D')
    """
)

I want to replace the list of elements in the spark.sql() statment with the python list so that that last line in the SQL is

...
AND col3 IN pylist

I am aware of using {} and str.format but I am struggling to understand if that's the correct option here, and how that workd.

CodePudding user response:

Okay, after some more research, I think I found the solution using .format(tuple(pylist))

The statement now is:

pylist = ['A', 'B', 'C', 'D']

s = """
    SELECT col1, col2, col3
    FROM database.table
    WHERE col3 IN {}
    """.format(tuple(pylist))

query = spark.sql(s)

CodePudding user response:

Convert to tuple to get "()" instead of "[]" while converting to string

sql_str="SELECT col1, col2, col3 FROM database.table WHERE col3 IN "   str(tuple(pylist))

query = spark.sql(sql_str)
  • Related