Home > Net >  How to do a for or while loop in Databricks SQL
How to do a for or while loop in Databricks SQL

Time:11-18

Does anyone know how to write a for or while loop in SQL in Databricks? I have tried many variations on the following SQL Server type code, but nothing seems to work. I can't seem to find any reference on it either. I would prefer not to do it in PySpark as I need to feed the output into more SQL code.

DECLARE @Counter INT 
SET @Counter=1
WHILE ( @Counter <= 10)
BEGIN
    PRINT 'The counter value is = '   @Counter
    SET @Counter  = @Counter    1
END

Any advice would be greatly appreciated.

Thanks.

CodePudding user response:

to achieve that just get all tables from SQL into dataframe and use collect to get every table into dataframe:

# import necessary libraries
import os
from pyspark.sql.functions import udf, col
from pyspark.sql.types import IntegerType

# get list of tables from SQL, for MySQL syntax you can use just SHOW TABLES
df_tables = spark.read.jdbc(url=os.environ['SQL_JDBC_URL'], table="""(SELECT
  *
FROM
  SYSOBJECTS
WHERE
  xtype = 'U') x""")

# register function
def create_df(table):
  
  globals()[table] = spark.read.jdbc(url=os.environ['SQL_JDBC_URL'], table=table)
  return globals()[table].count()

# for each name from table list create dataframe using function
for value in df_tables.select("name").collect():
  create_df(value['name'])

CodePudding user response:

Databricks SQL support is for basic SQL queries only . So procedure-oriented queries are not supported with current Databricks SQL version . This would fall under a new feature request.

Note: Databricks SQL provides a simple experience for SQL users who want to run quick ad-hoc queries on their data lake, create multiple visualization types to explore query results from different perspectives, and build and share dashboards. It is not supposed to replace ETL workloads running in Python/PySpark which we are currently handling .

  • Related