How to Group By using the distinct elements of an Array type column in Google BigQuery?-CodePudding

I have two tables Table_1 and Table_2 in the database as below:

The datatype of Name column is STRING and that of Matches column is ARRAY<INT64>. For each name, there can be one or more matches. For example, in Table_1, N1 has matches [1, 2]. I want to combine both Table_1 and Table_2, and generate a Result table like below:

In the Result table, the distinct match ID will come from the Matches column of both the tables and the Names column will show the names from both table related to the specific ID. For example, In Table_1, both N1 and N2 has a match with 1. So, in the Result table, we will have a row for 1 with N1, N2 as a comma separated in the Names column.

How can I achieve using Group BY and Aggregation in Google BigQuery?

CodePudding user response：

Consider below approach

select match, string_agg(name) names from (
  select name, match from table_1 t, t.matches match
  union all
  select name, match from table_2 t, t.matches match
)
group by match

if applied to sample data in your question - output is

CodePudding user response：

WITH Table_1 AS  
(
  SELECT 'N1' AS Name, [1,2] AS Matches UNION ALL
  SELECT 'N2' AS Name, [1,4] AS Matches UNION ALL
  SELECT 'N3' AS Name, [2,3] AS Matches
),
Table_2 AS  
(
  SELECT 'N2' AS Name, [10,11] AS Matches UNION ALL
  SELECT 'N4' AS Name, [12] AS Matches UNION ALL
  SELECT 'N5' AS Name, [10] AS Matches
)
SELECT
    Match,
    STRING_AGG(Name,', ' ORDER BY Name ASC) AS Names
FROM
    (
        SELECT
            Name,
            Match
        FROM 
            Table_1, 
            UNNEST(Table_1.Matches) AS Match
        UNION ALL
        SELECT
            Name,
            Match
        FROM 
            Table_2, 
            UNNEST(Table_2.Matches) AS Match
    )
GROUP BY
    Match

A slight addition to STRING_AGG to explicitly specify delimiter format as requested in the question.

Query Output