I have the following Json file that I'm reading into a dataframe.
{
"details": {
"box": [
{
"Touchdowns": "123",
"field": "Texans"
},
{
"Touchdowns": "456",
"field": "Ravens"
}
]
},
"name": "Team"
}
How could I manipulate this to get the following output?
Team | Touchdowns |
---|---|
Texans | 123 |
Ravens | 456 |
I'm struggling a bit with whether I need to pivot/transpose the data or if there is a more elegant approach.
CodePudding user response:
Read the multiline json into spark
df = spark.read.json('/path/to/scores.json',multiLine=True)
Schema
df:pyspark.sql.dataframe.DataFrame
details:struct
box:array
element:struct
Touchdowns:string
field:string
name:string
All of the info you want is in the first row, so get that and drill down to details and box and make that your new dataframe.
spark.createDataFrame(df.first()['details']['box']).withColumnRenamed('field','Team').show()
Output
---------- ------
|Touchdowns| Team|
---------- ------
| 123|Texans|
| 456|Ravens|
---------- ------
CodePudding user response:
You can use the inline
function.
df = spark.read.load(json_file_path, format='json', multiLine=True)
df = df.selectExpr('inline(details.box)').withColumnRenamed('field', 'Team')
df.show(truncate=False)
CodePudding user response:
You can try using a rdd
to get the values of box
list
.
Input JSON
jsonstr="""{
"details": {
"box": [
{
"Touchdowns": "123",
"field": "Texans"
},
{
"Touchdowns": "456",
"field": "Ravens"
}
]
},
"name": "Team"
}"""
Now convert it to an rdd
using the keys of dictionary
as below -
import json
box_rdd = sc.parallelize(json.loads(jsonstr)['details']['box'])
box_rdd.collect()
Output - [{'Touchdowns': '123', 'field': 'Texans'},
{'Touchdowns': '456', 'field': 'Ravens'}]
Finally create the dataframe with this box_rdd
as below -
from pyspark.sql.types import *
schema = StructType([StructField('Touchdowns', StringType(), True), StructField('field', StringType(), True)])
df = spark.createDataFrame(data=box_rdd,schema=schema)
df.show()
---------- ------
|Touchdowns| field|
---------- ------
| 123|Texans|
| 456|Ravens|
---------- ------