Home > database >  Class org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider not found when trying to write da
Class org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider not found when trying to write da

Time:04-12

I am trying to write data on an S3 bucket from my local computer:

spark = SparkSession.builder \
    .appName('application') \
    .config("spark.hadoop.fs.s3a.access.key", configuration.AWS_ACCESS_KEY_ID) \
    .config("spark.hadoop.fs.s3a.secret.key", configuration.AWS_ACCESS_SECRET_KEY) \
    .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
    .getOrCreate()

lines = spark.readStream \
    .format('kafka') \
    .option('kafka.bootstrap.servers', kafka_server) \
    .option('subscribe', kafka_topic) \
    .option("startingOffsets", "earliest") \
    .load()

streaming_query = lines.writeStream \
                    .format('parquet') \
                    .outputMode('append') \
                    .option('path', configuration.S3_PATH) \
                    .start()

streaming_query.awaitTermination()

Hadoop version: 3.2.1, Spark version 3.2.1

I have added the dependency jars to pyspark jars:

spark-sql-kafka-0-10_2.12:3.2.1, aws-java-sdk-s3:1.11.375, hadoop-aws:3.2.1,

I get the following error when executing:

py4j.protocol.Py4JJavaError: An error occurred while calling o68.start.
: java.io.IOException: From option fs.s3a.aws.credentials.provider 
java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider not found

CodePudding user response:

In my case, it worked in the end by adding the following statement: .config('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider')

Also, all the hadoop jars in site-package/pyspark/jars must be in the same version, hadoop-aws:3.2.2, hadoop-client-api-3.2.2, hadoop-client-runtime-3.2.2, hadoop-yam-server-web-proxy-3.2.2

For version 3.2.2 of hadoop-aws, aws-java-sdk-s3:1.11.563 package is needed

CodePudding user response:

I used same package with you. in my case, when i added below the line.

config('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider')

i got this error.

py4j.protocol.Py4JJavaError: An error occurred while calling o56.parquet.
: java.lang.NoSuchMethodError: 'void com.google.common.base.Preconditions.checkArgument(boolean, java.lang.String, java.lang.Object, java.lang.Object)'

....

For solving this I installed `guava-30.0

  • Related