I want to join tables after unnest arrays in Table:1 but the records duplicated after the join because of the unnest.
Table:1
| a | d.b | d.c |
-----------------
| 1 | 5 | 2 |
- -------------
| | 3 | 1 |
-----------------
| 2 | 2 | 1 |
Table:2
| a | c | f |
-----------------
| 1 | 12 | 13 |
-----------------
| 2 | 14 | 15 |
I want to join table 1 and 2 on a
but I need also to have the output of:
| a | d.b | d.c | f | h | Sum(count(a))
---------------------------------------------
| 1 | 5 | 2 | 13 | 12 |
- ------------- - - 1
| | 3 | 1 | | |
---------------------------------------------
| 2 | 2 | 1 | 15 | 14 | 1
a
can be repeated in table 2 for that I need to count(a) then select the sum after join.
My problem is when I'm joining I need the nested and repeated record to be the same as in the first table but when use aggregation to get the sum I can't group by struct or arrays so I UNNEST
the records first then use ARRAY_AGG
function but also there was an issue in the sum.
SELECT
t1.a,
t2.f,
t2.h,
ARRAY_AGG(DISTINCT(t1.db)) as db,
ARRAY_AGG(DISTINCT(t1.dc)) as dc,
SUM(t2.total) AS total
FROM (
SELECT
a,
d.b as db,
d.c as dc
FROM
`table1`,
UNNEST(d) AS d,
) AS t1
LEFT JOIN (
SELECT
a,
f,
h,
COUNT(*) AS total,
FROM
`table2`
GROUP BY
a,f,h) AS t2
ON
t1.a = t2.a
GROUP BY
1,
2,
3
Note: the error is in the total number after the sum it is much higher than expected all other data are correct.
CodePudding user response:
I guess your table 2 contains is not unique for column a
.
Lets assume that the table 2 looks like this:
a | c | f |
---|---|---|
1 | 12 | 13 |
2 | 14 | 15 |
1 | 100 | 101 |
There are two rows where a
is 1
. Since b
and f
are different, the grouping does not solve this ( GROUP BY a,f,h) AS t2
) and counts(*) as total
is one for each row.
a | c | f | total |
---|---|---|---|
1 | 12 | 13 | 1 |
2 | 14 | 15 | 1 |
1 | 100 | 101 | 1 |
In the next step you join this table to your table 1. The rows of table1 with value 1
in column a
are duplicated, because table2 has two entries. This lead to the fact that the sum is too high.
Instead of unnesting the tables, I recommend following approach:
-- Creating of sample data as given:
with tbl_A as (select 1 a, [struct(5 as b,2 as c),struct(3,1)] d union all select 2,[struct(2,1)] union all select null,[struct(50,51)]),
tbl_B as (select 1 as a,12 b, 13 f union all select 2,14,15 union all select 1,100,101 union all select null,500,501)
-- Query:
select *
from tbl_A A
left join
(Select a,array_agg(struct(b,f)) as B, count(1) as counts from tbl_B group by 1) B
on ifnull(A.a,-9)=ifnull(B.a,-9)