I have a BigQuery table with the following format:
person | friends.name | friends.year |
---|---|---|
John | Mary | 1977 |
Mike | 1984 | |
Mary | John | 1980 |
Mike | John | 1977 |
Jane | 1971 |
I want to compute, for each person
, the maximum year in a separate column, and also for each friends
record I would like to get the number of friends that each of the friends has (which would be achieved either with a self join, or with a window function).
I am not sure how to write this query, my approach so far has been:
SELECT person,
ARRAY(SELECT AS STRUCT f.name, f.year FROM UNNEST (Friends) f),
ARRAY_LENGTH(friends) AS number_friends
FROM table
However, this does not compute the number of friends for each array struct value. This is the output I am expecting:
person | friends.name | friends.year | friends.num_friends | max_year |
---|---|---|---|---|
John | Mary | 1977 | 1 | 1984 |
Mike | 1984 | 2 | ||
Mary | John | 1980 | 2 | 1980 |
Mike | John | 1977 | 2 | 1977 |
Jane | 1971 | 0 |
How can I write this query in an optimised way?
CodePudding user response:
Consider below approach
with friends_count as (
select person, ifnull(num_friends, 0) num_friends from (
select distinct name as person
from your_table, unnest(friends)
) left join (
select person, array_length(friends) num_friends
from your_table
) using(person)
)
select person, array(
select as struct name, year, ifnull(num_friends, 0) num_friends
from t.friends join friends_count on name = person
) friends,
(select max(year) from t.friends) max_year
from your_table t
if applied to sample data in your question - output is