I have following table:
Id col1 col2
1 a 1
1 b 2
1 c 3
2 a 1
2 e 3
2 f 4
Expected output is:
Id col3
1 a1b2c3
2 a1e3f4
The aggregation computation involves 2 columns, is this supported in SQL?
CodePudding user response:
In Spark SQL you can do it like this:
SELECT Id, aggregate(list, '', (acc, x) -> concat(acc, x)) col3
FROM (SELECT Id, array_sort(collect_list(concat(col1, col2))) list
FROM df
GROUP BY Id )
or in one select:
SELECT Id, aggregate(array_sort(collect_list(concat(col1, col2))), '', (acc, x) -> concat(acc, x)) col3
FROM df
GROUP BY Id
Higher-order aggregate
function is used in this example.
aggregate(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.