Error:'str' object has no attribute 'write' when converting Parquet to CSV-CodePudding

I have the following parquet files listed in my lake and I would like to convert the parquet files to CSV.

I have attempted to carry out the conversion using the suggestions on SO, but I keep on getting the Attribute Error:

AttributeError: 'str' object has no attribute 'write'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<command-507817377983169> in <module>
----> 1 df.write.format("csv").save("/mnt/lake/RAW/export/")

AttributeError: 'str' object has no attribute 'write'

I have created a dataframe to the location where the parquet files reside as 'df' which gives the following output:

Out[71]: '/mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal'

When I attempt to write / convert the parquets to CSV using either of the following I get the above error:

df.write.format("csv").save("/mnt/lake/RAW/export/")
df.write.csv(path)

I'm entering the following to read: df = spark.read.parquet("/mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal/"), but I'm getting the following error message:

A transaction log for Databricks Delta was found at /mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal/_delta_log, but you are trying to read from /mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal/ using format("parquet"). You must use 'format("delta")' when reading and writing to a delta table. To disable this check, SET spark.databricks.delta.formatCheck.enabled=false

CodePudding user response：

The file you have stored is in delta format. So, read it as the following command

df= spark.read.format("delta").load(path_to_data)

Once loaded, try to display first to make sure that it is loaded properly using display(df).

If the output is as expected, then you can write it as CSV to your desired location.

CodePudding user response：

Type of df variable is a string and its value is /mnt/lake/CUR/CURATED/F1Area/F1Domain/myfinal.

You need to read the file first and make sure df variable is a pyspark dataframe before calling df.write