I am saving csv files as stream with pyspark. When I saving files, I am using output mode is 'overwrite' and there is not any problem. But when I want to containerize my spark app is giving an error. I add code and the error below:
df.write.format("csv").mode("overwrite").save("/app/files")
java.io.IOException: Unable to clear output directory file:/app/files prior to writing to it
I think the error is due to permissions. So I tried USER root in dockerfile but the error not fixed.
CodePudding user response:
how about
df.write.format("csv").mode("overwrite").save("file:///app/files")
CodePudding user response:
What is the running mode of the spark program, Local or Standalone or YARN or... Check whether the spark program in docker is running as the root user, for example:
ps -ef | grep spark_app_name