I have written Spark Scala code. I have a strange issue where I am able to read successfully from Azure SQL using AD authentication in Dataproc cluster. But I am getting the below error while writing to Azure SQL.
NoClassDefFoundError: com/microsoft/aad/adal4j/AuthenticationException
And I am getting this error only while running in Dataproc cluster. The same code works fine in my local machine.
Just to be more clear, I got the same error while reading too in Dataproc and I resolved it using this solution by using Maven shade plugin to relocate the conflicting library. But now again I am getting same error while writing. Not sure what is going wrong. Why is the write failing in Dataproc? Please help
Code sample:
Reading from Azure SQL(Working fine):
spark.read
.format("com.microsoft.sqlserver.jdbc.spark")
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.option("encrypt", "false")
.option("url", url)
.option("database", database)
.option("user", user)
.option("password", password)
.option("query", query)
.option("authentication", "ActiveDirectoryPassword")
.load()
Writing to Azure SQL(Failing in Dataproc):
df.write
.format("jdbc")
.mode(mode)
.option("url", url)
.option("database", database)
.option("user", user)
.option("password", password)
.option("dbtable", table)
.option("authentication", "ActiveDirectoryPassword")
.save()
Maven Shade plugin:
<relocation>
<pattern>com</pattern>
<shadedPattern>repackaged.com.microsoft</shadedPattern>
<includes>
<include>com.microsoft.**</include>
</includes>
</relocation>
Other Azure dependencies:
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>msal4j</artifactId>
<version>1.10.0</version>
</dependency>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>adal4j</artifactId>
<version>1.6.7</version>
</dependency>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>spark-mssql-connector_2.12</artifactId>
<version>1.2.0</version>
</dependency>
CodePudding user response:
I resolved the issue by adding .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
in write operation and it succeeded. Found the issue highlighted here