Home > Enterprise >  Why is Pyspark unable to find bigquery datasource. Even when JAR file is provided?
Why is Pyspark unable to find bigquery datasource. Even when JAR file is provided?

Time:09-24

This is my pyspark configuration. Ive followed the steps mentioned here and didnt create a sparkcontext.

 spark = SparkSession \
        .builder \
        .appName(appName) \
        .config(conf=spark_conf) \
        .config('spark.jars.packages', 'com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.22.0') \
        .config('spark.jars.packages','com.google.cloud.bigdataoss:gcsio:1.5.4') \
        .config('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar,spark-bigquery-with-dependencies_2.12-0.21.1.jar,spark-bigquery-latest_2.11.jar') \
        .config('spark.jars', 'postgresql-42.2.23.jar,bigquery-connector-hadoop2-latest.jar') \
        .getOrCreate()

Then when i try to write a demo spark dataframe to bigquery

df.write.format('bigquery') \
        .mode(mode) \
        .option("credentialsFile", "creds.json") \
        .option('table', table) \
        .option("temporaryGcsBucket",bucket) \
        .save()

It throws and error

File "c:\sparktest\vnenv\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o60.save.
: java.lang.ClassNotFoundException: Failed to find data source: bigquery. Please find packages at http://spark.apache.org/third-party-projects.html

CodePudding user response:

My problem was with faulty jar versions. I am using spark 3.1.2 and hadoop 3.2 this was the maven jars with code which worked for me.

spark = SparkSession \
    .builder \
    .master('local') \
    .appName('spark-read-from-bigquery') \
    .config('spark.jars.packages','com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.22.0,com.google.cloud.bigdataoss:gcs-connector:hadoop3-1.9.5,com.google.guava:guava:r05') \
    .config('spark.jars','guava-11.0.1.jar,gcsio-1.9.0-javadoc.jar') \ # you will have to download these jars manually
    .getOrCreate()
  • Related