I'm trying save/convert a parquet file to csv on Apache Spark with Databricks but not having much luck.
The following code successfully writes to a folder called tempDelta:
df.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(saveloc "/tempDelta")
I then would like to convert the parquet file to csv as follows:
df.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(saveloc "/tempDelta").csv(saveloc "/tempDelta")
AttributeError Traceback (most recent call last)
<command-2887017733757862> in <module>
----> 1 df.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(saveloc "/tempDelta").csv(saveloc "/tempDelta")
AttributeError: 'NoneType' object has no attribute 'csv'
I have also tried the following after writing to the location:
df.write.option("header","true").csv(saveloc "/tempDelta2")
But it get the error:
A transaction log for Databricks Delta was found at `/CURATED/F1Area/F1Domain/final/_delta_log`,
but you are trying to write to `/CURATED/F1Area/F1Domain/final/tempDelta2` using format("csv"). You must use
'format("delta")' when reading and writing to a delta table.
And when I try to save as a csv to folder that isn't a delta folder I get the following error:
df.write.option("header","true").csv("testfolder")
AnalysisException: CSV data source does not support struct data type.
Can someone let me know the best way of saving / converting from parquet to csv with Databricks
CodePudding user response:
You can use either of the below 2 options
1. df.write.option("header",true).csv(path)
2. df.write.format("csv").save(path)
Note: You cant mention format as parquet and use .csv function at once.