In my input json data I have root element which is wrapper for my desired data, I would like to remove it and just have target data in each record.
Input data:
{"rootElement": {"firstName": "John", "lastName": "Doe", "age": 11}}
{"rootElement": {"firstName": "Jane", "lastName": "Doe", "age": 33}}
{"rootElement": {"firstName": "Scott", "lastName": "Smith", "age": 22}}
Expected output:
{"firstName": "John", "lastName": "Doe", "age": 11}
{"firstName": "Jane", "lastName": "Doe", "age": 33}
{"firstName": "Scott", "lastName": "Smith", "age": 22}}
I tried this so far:
sparkSession.read.json(inputFileLocation).toDF().map(func => func.getObject("rootElement"))
but won't compile
CodePudding user response:
sparkSession read json return a dataframe already, no need to do toDf()
try this:
val df = sparkSession.read.json("your path")
df.select($"rootElement.*").write.json("your output path")
CodePudding user response:
Your json data is not good need be fix first and you can also remove rootElement. so. i am just taking data in rdd. and fix it.
//Load json Data file by Rdd
val rdd_data = sc.textFile("file:///D:/myfile2.json")
// here remove rootElement with first and last "{}"
val pro_data = rdd_data.map(x=>x.replace("{\"rootElement\":",""))
val pro2=pro_data.map(x=>x.replace("}}","}"))
val pro3 =pro2.map(x=>x.replace("\"age\"","\"age\":"))
//here is json mistake so just add colon.and check everything is alright.
pro3.foreach(println)
//read dataframe dont need to make df.so
val df = spark.read.json(pro3)
df.show()