The same code is working completely fine on Linux Ubuntu with the same jar files.My spark is 3.1.2 and hadoop is 3.2. Ive tried every gcs connector version from maven.
val = df.write.format('bigquery') \ #df is a spark.dataframe
.mode(mode) \
.option("credentialsFile", "creds.json") \
.option('table', table) \
.option("temporaryGcsBucket", bucket) \
.save()
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/09/17 07:41:50 WARN FileSystem: Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem could not be instantiated
21/09/17 07:41:50 WARN FileSystem: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkState(ZLjava/lang/String;J)V
21/09/17 07:41:50 WARN FileSystem: Cannot load filesystem: java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem could not be instantiated
21/09/17 07:41:50 WARN FileSystem: java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
Traceback (most recent call last):
File "c:\sparktest\main.py", line 158, in <module>
val = df.write.format('bigquery') \
File "c:\sparktest\vnenv\lib\site-packages\pyspark\sql\readwriter.py", line 828, in save
self._jwrite.save()
File "c:\sparktest\vnenv\lib\site-packages\py4j\java_gateway.py", line 1304, in __call__
return_value = get_return_value(
File "c:\sparktest\vnenv\lib\site-packages\pyspark\sql\utils.py", line 128, in deco
return f(*a, **kw)
File "c:\sparktest\vnenv\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o50.save.
: java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
CodePudding user response:
I forgot to add them in my spark config.
spark = SparkSession \
.builder \
.appName(appName) \
.config(conf=spark_conf) \
.config('spark.jars.packages', 'com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.22.0') \
.getOrCreate()