Home > Enterprise >  How do I configure Spark's "per bucket" settings for complex S3 bucket names?
How do I configure Spark's "per bucket" settings for complex S3 bucket names?

Time:07-14

For "per bucket" settings, if I have an S3 bucket name my.whole.name with dots (periods) in the name, how do I escape or include them in the Spark settings? Quotes do not work:

sparkConf.set('spark.hadoop.fs.s3a."my.whole.name".access.key',<redacted>)

Reference on "per bucket" access configuration, but does not mention complex bucket names:

https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Configuring_different_S3_buckets_with_Per-Bucket_Configuration

CodePudding user response:

Try putting "bucket" in the configuration and removing the quotes:

 sparkConf.set('spark.hadoop.fs.s3a.bucket.my.whole.name.access.key',<redacted>)

It should be parsed internally by Spark.

CodePudding user response:

s3a doesn't support per bucket settings with dots in their name. no way to work out what is the bucket and what is the option. AWS docs say such names "not recommended for uses other than static website hosting"

see SPARK-32766 for more on this.

  • Related