I'm trying to create schema for json, and see it as columns in dataframe
Input json
{"place":{"place_name":"NYC","lon":0,"lat":0,"place_id":1009}, "region":{"region_issues":[{"key":"health","issue_name":"Cancer"},{"key":"sports","issue_name":"swimming"}}}
code
val schemaRsvp = new StructType()
.add("place", StructType(Array(
StructField("place_name", DataTypes.StringType),
StructField("lon", DataTypes.IntegerType),
StructField("lat", DataTypes.IntegerType),
StructField("place_id", DataTypes.IntegerType))))
val ip = spark.read.schema(schemaRsvp).json("D:\\Data\\rsvp\\inputrsvp.json")
ip.show()
Its showing all the fields in single column place
, want values column wise
place_name,lon,lat,place_id
NYC,0,0,1009
Any suggestion, how to fix this ?
CodePudding user response:
You can convert struct type into columns using ".*"
ip.select("place.*").show()
---------- --- --- --------
|place_name|lon|lat|place_id|
---------- --- --- --------
| NYC| 0| 0| 1009|
---------- --- --- --------
UPDATE:
with the new column array you can explode your date and then do the same ".*" to convert struct type into columns:
ip.select(col("place"), explode(col("region.region_issues")).as("region_issues"))
.select("place.*", "region_issues.*").show(false)
--- --- -------- ---------- ---------- ------
|lat|lon|place_id|place_name|issue_name|key |
--- --- -------- ---------- ---------- ------
|0 |0 |1009 |NYC |Cancer |health|
|0 |0 |1009 |NYC |swimming |sports|
--- --- -------- ---------- ---------- ------