Home > front end >  Removing root element from json record in Spark
Removing root element from json record in Spark

Time:09-26

In my input json data I have root element which is wrapper for my desired data, I would like to remove it and just have target data in each record.

Input data:

{"rootElement": {"firstName": "John", "lastName": "Doe", "age": 11}}
{"rootElement": {"firstName": "Jane", "lastName": "Doe", "age": 33}}
{"rootElement": {"firstName": "Scott", "lastName": "Smith", "age": 22}}

Expected output:

{"firstName": "John", "lastName": "Doe", "age": 11}
{"firstName": "Jane", "lastName": "Doe", "age": 33}
{"firstName": "Scott", "lastName": "Smith", "age": 22}}

I tried this so far:

sparkSession.read.json(inputFileLocation).toDF().map(func => func.getObject("rootElement"))

but won't compile

CodePudding user response:

sparkSession read json return a dataframe already, no need to do toDf()

try this:

val df = sparkSession.read.json("your path")
df.select($"rootElement.*").write.json("your output path")

CodePudding user response:

Your json data is not good need be fix first and you can also remove rootElement. so. i am just taking data in rdd. and fix it.

//Load json Data file by Rdd
val rdd_data = sc.textFile("file:///D:/myfile2.json")
// here remove rootElement with first and last "{}"
val pro_data = rdd_data.map(x=>x.replace("{\"rootElement\":",""))
val pro2=pro_data.map(x=>x.replace("}}","}"))
val pro3 =pro2.map(x=>x.replace("\"age\"","\"age\":"))

//here is json mistake so just add colon.and check everything is alright.
pro3.foreach(println)

//read dataframe dont need to make df.so
val df = spark.read.json(pro3)
df.show()

Output:enter image description here

  • Related