Py4JJavaError: An error occured while calling o8660.save when trying to save csv file locally-CodePudding

I want to save a csv file locally rather than saving it to Hadoop file system. I got the following error when I use the path that starts with

> 'file://'

How can I fixed this? Or How can I save the file locally without any errors?

CodePudding user response：

I'm afraid its not gonna work like that because saving the data locally implies it must all be present on the driver. Per pyspark docs, the path parameter in pyspark.sql.DataFrameWriter.csv is a "path in any Hadoop supported file system".

So as far as I can tell, there are several alternatives:

Save dataframe to HDFS/Hadoop and then copy it to local FS hdfs dfs -mget .... This would be most straightforward and preferred way.
Do df.collect() to bring complete dataframe to the driver, and then write it to local FS. This might not be feasible for large dataframes, since it can crash the driver with OOM.
use df.toLocalIterator() to bring data to the driver one partition at a time, and then write it to local FS. This avoids / lessens OOM chances presented by previous option.