Home > Software design >  How to use python variable in SQL Query in Databricks?
How to use python variable in SQL Query in Databricks?

Time:06-05

I am trying to convert a SQL stored procedure to databricks notebook. In the stored procedure below 2 statements are to be implemented. Here the tables 1 and 2 are delta lake tables in databricks cluster.

I want to use a python variable in place of max_date in SQL query. How to do it?

%sql
DELETE FROM table1 WHERE Date = max_date;
INSERT INTO table1
SELECT * FROM table2 WHERE Date = max_date;

enter image description here

CodePudding user response:

If you are going to run it cell by cell then you can use databricks widgets like

First cell

x=str(datetime.date.today())
dbutils.widgets.text("max_date",x)

Second cell

%sql
select getArgument("max_date") AS max_date

will give you

max_date
2022-06-04

but as mentioned here it does not work when run all is used and ideal way will be to create separate language based notebook and pass variables using %run

Other way is to use spark conf like below

First set a value for the conf [Note-the conf name should have .(dot) in it]

max_date2=str(datetime.date.today())
spark.conf.set("abc.max_dt2", max_date2)

Next try selecting value like below

%sql
select "${abc.max_dt2}" as max_date

It should give same value as above

  • Related