I am trying to save a data file into S3 through AWS Glue Job.
I am trying using spark dataframe as
df.coalesce(4).write.mode("overwrite").format("parquet").save("s3://my_own_bucket/")
When I test it, the code return an error
pyspark.sql.utils.IllegalArgumentException: 'Can not create a Path from an empty string'
If I try with to place a file inside a folder like
df.coalesce(4).write.mode("overwrite").format("parquet").save("s3://my_own_bucket/folder1")
the file is placed in the folder "folder1"
Anyone knows the reason of why I cannot place a folder in bucket level?
PS: I also tryed using dynamic frame and works. I wanna know why using "normal" spark does not work.
final_sink = glueContext.write_dynamic_frame.from_options(frame = frame1, connection_type = "s3", connection_options = {"path": "s3://my_own_bucket"}, format = "parquet")
CodePudding user response:
the answer is "no". you can't write a DF to the root of any file system, it's just that nobody has ever tried to use file:// or hdfs:// as a destination. root dirs are "special" -you can't delete them, the path relative to root is "", etc. nobody has ever sat down to add the feature
see SPARK-34298
CodePudding user response:
Hi please remove "/" from path
df.coalesce(4).write.mode("overwrite").format("parquet").save("s3://my_own_bucket")