Home > Back-end >  not able to get nested json value as column
not able to get nested json value as column

Time:01-11

I'm trying to create schema for json, and see it as columns in dataframe

Input json

{"place":{"place_name":"NYC","lon":0,"lat":0,"place_id":1009}, "region":{"region_issues":[{"key":"health","issue_name":"Cancer"},{"key":"sports","issue_name":"swimming"}}}

code

  val schemaRsvp =  new StructType()
      .add("place",  StructType(Array(
      StructField("place_name", DataTypes.StringType),
      StructField("lon", DataTypes.IntegerType),
      StructField("lat", DataTypes.IntegerType),
      StructField("place_id", DataTypes.IntegerType))))

 val ip =  spark.read.schema(schemaRsvp).json("D:\\Data\\rsvp\\inputrsvp.json")
 ip.show()

Its showing all the fields in single column place, want values column wise

place_name,lon,lat,place_id
NYC,0,0,1009

Any suggestion, how to fix this ?

CodePudding user response:

You can convert struct type into columns using ".*"

ip.select("place.*").show()

 ---------- --- --- -------- 
|place_name|lon|lat|place_id|
 ---------- --- --- -------- 
|       NYC|  0|  0|    1009|
 ---------- --- --- -------- 

UPDATE:

with the new column array you can explode your date and then do the same ".*" to convert struct type into columns:

ip.select(col("place"), explode(col("region.region_issues")).as("region_issues"))
  .select("place.*", "region_issues.*").show(false)

 --- --- -------- ---------- ---------- ------ 
|lat|lon|place_id|place_name|issue_name|key   |
 --- --- -------- ---------- ---------- ------ 
|0  |0  |1009    |NYC       |Cancer    |health|
|0  |0  |1009    |NYC       |swimming  |sports|
 --- --- -------- ---------- ---------- ------ 
  • Related