Is it not possible to save a file in S3 bucket instead of choose a folder?-CodePudding

I am trying to save a data file into S3 through AWS Glue Job.

I am trying using spark dataframe as


df.coalesce(4).write.mode("overwrite").format("parquet").save("s3://my_own_bucket/")

When I test it, the code return an error

pyspark.sql.utils.IllegalArgumentException: 'Can not create a Path from an empty string'

If I try with to place a file inside a folder like


df.coalesce(4).write.mode("overwrite").format("parquet").save("s3://my_own_bucket/folder1")

the file is placed in the folder "folder1"

Anyone knows the reason of why I cannot place a folder in bucket level?

PS: I also tryed using dynamic frame and works. I wanna know why using "normal" spark does not work.

final_sink = glueContext.write_dynamic_frame.from_options(frame = frame1, connection_type = "s3", connection_options = {"path": "s3://my_own_bucket"}, format = "parquet")

CodePudding user response：

the answer is "no". you can't write a DF to the root of any file system, it's just that nobody has ever tried to use file:// or hdfs:// as a destination. root dirs are "special" -you can't delete them, the path relative to root is "", etc. nobody has ever sat down to add the feature

see SPARK-34298

CodePudding user response：

Hi please remove "/" from path

df.coalesce(4).write.mode("overwrite").format("parquet").save("s3://my_own_bucket")