Home > other >  How to load a json file which is having double quotes within a string into a dataframe in spark scal
How to load a json file which is having double quotes within a string into a dataframe in spark scal

Time:09-23

I have the below json file which i want to read into a dataframe but i wm getting error as the json file has double quotes within the string.for example:

data:{ "Field1":"val"ue 1", "Field2":"value2", "Field3":"va"lu"e3" }

Required output" Field1,Field2,Field3 Value1,value2,value2

CodePudding user response:

Your json is not valid (because of the nested double quotes), this is why you have an error when you read the file with Spark data source API or with any other Json parser.

What you can do is to read your file as a dataset of String, then clean each String using a Regex to remove the useless double quotes, and finally use "from_json" function in order to parse each string and convert your dataset from a Dataset[String] to a Dataset[< your case class >].

  • Related