I am using Spark SQL to create grouping sets
with total columns to reduce the size of a file which I have previously cubed.
My code is as follows
final_sum = sqlContext.sql('''
SELECT case when GROUPING(x_flag)=1 then 'total' else x_flag END AS x_flag,
...
from trans,
group by x_flag,...,
GROUPING SETS(
(x_flag,y_name)
...
)
''')
My column x_flag
is indexed 0,1 (as integers) and I am trying to get grouping sets
based on that column mainly.
I currently get the following:
mismatched input x_flag expecting {, ';'}(line 14, pos 11)
Is there a method to fix this without having to reflag the column as y and n or is there some error which I am missing with using integer in case when
?
CodePudding user response:
I could not reproduce the error, but I have created a working example for you.
df = spark.createDataFrame(
[(0, 0),
(0, 1),
(1, 0),
(1, 1)],
['x_flag', 'y_name'])
df.createOrReplaceTempView("trans")
spark.sql("""
SELECT
CASE when GROUPING(x_flag)=1 then 'total' else x_flag END AS x_flag,
CASE when GROUPING(y_name)=1 then 'total' else y_name END AS y_name,
count(*)
FROM trans
GROUP BY
GROUPING SETS(
(x_flag, y_name),
(x_flag),
(y_name)
)
""").show()
# ------ ------ --------
# |x_flag|y_name|count(1)|
# ------ ------ --------
# | 0| 0| 1|
# | 0| total| 2|
# | total| 0| 2|
# | total| 1| 2|
# | 0| 1| 1|
# | 1| total| 2|
# | 1| 0| 1|
# | 1| 1| 1|
# ------ ------ --------
Make sure you use correct syntax: in your question, I have noticed a comma after the from
clause.
Also, group by
doesn't seem to require additional arguments when using grouping sets
.