I have the following parquet files listed in my lake and I would like to convert the parquet files to CSV.
I have attempted to carry out the conversion using the suggestions on SO, but I keep on getting the Attribute Error:
AttributeError: 'str' object has no attribute 'write'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<command-507817377983169> in <module>
----> 1 df.write.format("csv").save("/mnt/lake/RAW/export/")
AttributeError: 'str' object has no attribute 'write'
I have created a dataframe to the location where the parquet files reside as 'df' which gives the following output:
Out[71]: '/mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal'
When I attempt to write / convert the parquets to CSV using either of the following I get the above error:
df.write.format("csv").save("/mnt/lake/RAW/export/")
df.write.csv(path)
I'm entering the following to read: df = spark.read.parquet("/mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal/")
, but I'm getting the following error message:
A transaction log for Databricks Delta was found at /mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal/_delta_log, but you are trying to read from /mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal/ using format("parquet"). You must use 'format("delta")' when reading and writing to a delta table. To disable this check, SET spark.databricks.delta.formatCheck.enabled=false
CodePudding user response:
The file you have stored is in delta format. So, read it as the following command
df= spark.read.format("delta").load(path_to_data)
Once loaded, try to display first to make sure that it is loaded properly using display(df)
.
If the output is as expected, then you can write it as CSV to your desired location.
CodePudding user response:
Type of df
variable is a string and its value is /mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal
.
You need to read the file first and make sure df
variable is a pyspark dataframe before calling df.write