Working with the following marketing JSON file
{
"request_id": "xx",
"timeseries_stats": [
{
"timeseries_stat": {
"id": "xx",
"timeseries": [
{
"start_time": "xx",
"end_time": "xx",
"stats": {
"impressions": xx,
"swipes": xx,
"view_completion": xx,
"spend": xx
}
},
{
"start_time": "xx",
"end_time": "xx",
"stats": {
"impressions": xx,
"swipes": xx,
"view_completion": xx,
"spend": xx
}
}
I can parse this using pandas very easily and obtain the desired dataframe in the format
start_time end_time impressions swipes view_completion spend
xx xx xx xx xx xx
xx xx xx xx xx xx
but need to do it in spark on AWS Glue.
After creating an initial spark dataframe (df) using
rdd = sc.parallelize(JSON_resp['timeseries_stats'][0]['timeseries_stat']['timeseries'])
df = rdd.toDF()
I tried expanding the stats key as follows
df_expanded = df.select("start_time","end_time","stats.*")
Error:
AnalysisException: 'Can only star expand struct data types.
Attribute: `ArrayBuffer(stats)`;'
&
from pyspark.sql.functions import explode
df_expanded = df.select("start_time","end_time").withColumn("stats", explode(df.stats))
Error:
AnalysisException: 'The number of aliases supplied in the AS clause does not match the
number of columns output by the UDTF expected 2 aliases but got stats ;
Pretty new to spark, any help would be much appreciated for either of the 2 approaches!
It's a pretty similar problem as in:
parse array of dictionaries from JSON with Spark
except I need to flatten this additional stats key.
CodePudding user response:
When you explode
a map column, it will give you two columns and so .withColumn
is not working. Use explode
with select
statement.
df.select('start_time', 'end_time', f.explode('stats')) \
.groupBy('start_time', 'end_time').pivot('key').agg(f.first('value')).show()
---------- -------- ----------- ----- ------ ---------------
|start_time|end_time|impressions|spend|swipes|view_completion|
---------- -------- ----------- ----- ------ ---------------
| yy| yy| yy| yy| yy| yy|
| xx| xx| xx| xx| xx| xx|
---------- -------- ----------- ----- ------ ---------------