Home > front end >  group by element of many array of array in clickhouse/sparksql/pyspark
group by element of many array of array in clickhouse/sparksql/pyspark

Time:09-09

Having the table A below:

id    units
 1    [1,1,1]
 2    [3,0,0]
 1    [5,3,7]
 3    [2,5,2]
 2    [3,2,6]

I would like to query something like:

select id, array_append(units) from A group by id

And get the following result:

id    units
 1    [1,1,1,5,3,7]
 2    [3,0,0,3,2,6]
 3    [2,5,2]

CodePudding user response:

You want to achieve the list flattening after the grouping, you can

  df.groupby('id').agg(func.flatten(func.collect_list('units')).alias('units')).show(10, False)

CodePudding user response:

For ClickHouse use the function groupArray and combinator Array:

SELECT
    id,
    groupArrayArray(units)
FROM
(
    SELECT
        data.1 AS id,
        data.2 AS units
    FROM
    (
        SELECT arrayJoin([(1, [1, 1, 1]), (2, [3, 0, 0]), (1, [5, 3, 7]), (3, [2, 5, 2]), (2, [3, 2, 6])]) AS data
    )
)
GROUP BY id

/*
┌─id─┬─groupArrayArray(units)─┐
│  1 │ [1,1,1,5,3,7]          │
│  2 │ [3,0,0,3,2,6]          │
│  3 │ [2,5,2]                │
└────┴────────────────────────┘
*/
  • Related