Home > Mobile >  PySpark to Azure SQL Database connection issue
PySpark to Azure SQL Database connection issue

Time:08-04

I'm trying to connect to Azure SQL Database from Azure Synapse workspace Notebook using PySpark. Also I would like to use Active Directory integrated authentication. So what I've tried:

jdbc_df = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", "jdbc:sqlserver://my_server_name.database.windows.net:1433") \
        .option("database","my_db_name") \
        .option("dbtable", "my_table_or_query") \
        .option("authentication", "ActiveDirectoryIntegrated") \
        .option("encrypt", "true") \
        .option("hostNameInCertificate", "*.database.windows.net") \
        .load()

Also I've tried the same way but in different syntax

jdbcUrl = "jdbc:sqlserver://my_server_name.database.windows.net:1433;database=my_db_name;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;Authentication=ActiveDirectoryIntegrated"

 pushdown_query = "SELECT col1 FROM my_table_name"
 connectionProperties = {
   "driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
 }
 df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query)
 display(df)

And in both cases I get error

IllegalArgumentException: KrbException: Cannot locate default realm

What I'm doing wrong?

CodePudding user response:

You can use the below code to read the data from Azure SQL Database using Python:

# define sql database parameters
jdbcHostname = "my_server_name.database.windows.net"
jdbcDatabase = "my_db_name"
jdbcPort = "1433"
username = "username"
password = "XXXXXXXXXX"
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)

# passing parameters in connection properties
connectionProperties = {
"user" : username,
"password" : password,
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}

# sample query
pushdown_query = "SELECT col1 FROM my_table_name"

# reading data 
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)

# print data
display(df)

CodePudding user response:

Finally I have found the solution! First of all there should be created working Linked service to Azure SQL database in your Synapse Analytics that uses Authentication type "System Assigned Managed Identity". Than you can reference it in your PySpark Notebook. And don't be confused that method getConnectionString is used to get access token - it really returns not connection string but token.

jdbcUrl = "jdbc:sqlserver://my_server_name.database.windows.net:1433;database=my_db_name;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30"

token=TokenLibrary.getConnectionString("AzureSQLLinkedServiceName")

pushdown_query = "(SELECT col1 FROM my_table_name) as tbl"
connectionProperties = {
 "driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
 "accessToken" : token
}
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
display(df)
  • Related