Home > database >  Replace "" (empty string) to null using Spark dataframe.write method while writing to csv
Replace "" (empty string) to null using Spark dataframe.write method while writing to csv

Time:11-01

While writing a spark dataframe using write method to a csv file, the csv file is getting populated as "" for null strings

101|abc|""|555
102|""|xyz|743

Using the below code:

dataFrame
  .coalesce(16)
  .write
  .format("csv")
  .option("delimiter", "|")
  .option("treatEmptyValuesAsNulls", "true")
  .option("nullValue", null)
  .option("emptyValue", null)
  .mode(SaveMode.Overwrite)
  .save(path)

Expected output:

101|abc|null|555
102|null|xyz|743

Spark version 3.2 and Scala version 2.1

CodePudding user response:

The issue seems to be in the option definition; the option values should be specified as String "null" instead of null, like:

dataFrame.coalesce(16).write.format("csv")
.option("delimiter", "|")
.option("treatEmptyValuesAsNulls", "true")
.option("nullValue", "null")
.option("emptyValue", "null")
.mode(SaveMode.Overwrite).save(path)

CodePudding user response:

dataFrame.coalesce(16).write.format("csv")
.option("delimiter", "|")
.option("treatEmptyValuesAsNulls", "true")
.option("nullValue", "\u0000")
.option("emptyValue", "\u0000")
.mode(SaveMode.Overwrite).save(path)

This solved the issue.

  • Related