I'm trying to load two jars into my AWS Glue/Spark read method but got an error:
An error occurred while calling o142.save.
: java.lang.SecurityException: class "com.microsoft.sqlserver.jdbc.ISQLServerBulkData"'s signer information does not match signer information of other classes in the same package
at java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
at java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
at java.lang.ClassLoader.defineClass(ClassLoader.java:754)
at java.security.SecureClas...
My code below, I tried multiple glue_dynamicFrame write
methods but bulk insert into SQL erver is not working. According to MS these drivers should do the trick.
Any suggestions on fixing it are highly welcomed!
def write_df_to_target(self, df, schema_table):
spark = self.gc.spark_session
spark.builder.config('spark.jars.packages', 'com.microsoft.sqlserver:mssql-jdbc:8.4.1.jre8,com.microsoft.azure:spark-mssql-connector_2.12:1.1.0').getOrCreate()
credentials = self.get_credentials(self.replica_connection_name)
df.write \
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", credentials["url"] ";databaseName=" self.database_name) \
.option("dbtable", schema_table) \
.option("user", credentials["user"]) \
.option("password", credentials["password"]) \
.option("batchsize","50000") \
.option("numPartitions","150") \
.option("bulkCopyTableLock","true") \
.save()
CodePudding user response:
Using com.microsoft.sqlserver:mssql-jdbc:8.4.1.jre8
is one thing but also you need proper version of MS' Spark SQL Connector
com.microsoft.azure:spark-mssql-connector_2.12_3.0:1.0.0-alpha
and com.microsoft.sqlserver:mssql-jdbc:8.4.1.jre8
did not work for my case as I'm using AWS Glue 3.0 (which is Spark 3.1)
I had to switch to com.microsoft.azure:spark-mssql-connector_2.12:1.2.0
as it's Spark 3.1 compatible.
def write_df_to_target(self, df, schema_table):
spark = self.gc.spark_session
spark.builder.config('spark.jars.packages', 'com.microsoft.sqlserver:mssql-jdbc:8.4.1.jre8,com.microsoft.azure:spark-mssql-connector_2.12:1.2.0').getOrCreate()
credentials = self.get_credentials(self.replica_connection_name)
df.write \
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", credentials["url"] ";databaseName=" self.database_name) \
.option("dbtable", schema_table) \
.option("user", credentials["user"]) \
.option("password", credentials["password"]) \
.option("batchsize","100000") \
.option("numPartitions","15") \
.save()