While writing a spark dataframe using write method to a csv file, the csv file is getting populated as "" for null strings
101|abc|""|555
102|""|xyz|743
Using the below code:
dataFrame
.coalesce(16)
.write
.format("csv")
.option("delimiter", "|")
.option("treatEmptyValuesAsNulls", "true")
.option("nullValue", null)
.option("emptyValue", null)
.mode(SaveMode.Overwrite)
.save(path)
Expected output:
101|abc|null|555
102|null|xyz|743
Spark version 3.2 and Scala version 2.1
CodePudding user response:
The issue seems to be in the option definition; the option values should be specified as String "null"
instead of null
, like:
dataFrame.coalesce(16).write.format("csv")
.option("delimiter", "|")
.option("treatEmptyValuesAsNulls", "true")
.option("nullValue", "null")
.option("emptyValue", "null")
.mode(SaveMode.Overwrite).save(path)
CodePudding user response:
dataFrame.coalesce(16).write.format("csv")
.option("delimiter", "|")
.option("treatEmptyValuesAsNulls", "true")
.option("nullValue", "\u0000")
.option("emptyValue", "\u0000")
.mode(SaveMode.Overwrite).save(path)
This solved the issue.