How to compute number of friends of friends in a BigQuery table with repeated records?-CodePudding

I have a BigQuery table with the following format:

person	friends.name	friends.year
John	Mary	1977
	Mike	1984
Mary	John	1980
Mike	John	1977
	Jane	1971

I want to compute, for each person, the maximum year in a separate column, and also for each friends record I would like to get the number of friends that each of the friends has (which would be achieved either with a self join, or with a window function).

I am not sure how to write this query, my approach so far has been:

SELECT person, 
   ARRAY(SELECT AS STRUCT f.name, f.year FROM UNNEST (Friends) f), 
   ARRAY_LENGTH(friends) AS number_friends
FROM table

However, this does not compute the number of friends for each array struct value. This is the output I am expecting:

person	friends.name	friends.year	friends.num_friends	max_year
John	Mary	1977	1	1984
	Mike	1984	2
Mary	John	1980	2	1980
Mike	John	1977	2	1977
	Jane	1971	0

How can I write this query in an optimised way?

CodePudding user response：

Consider below approach

with friends_count as (
  select person, ifnull(num_friends, 0) num_friends from (
  select distinct name as person
  from your_table, unnest(friends)
  ) left join (
    select person, array_length(friends) num_friends
    from your_table
  ) using(person)
)
select person, array( 
    select as struct name, year, ifnull(num_friends, 0) num_friends
    from t.friends join friends_count on name = person
  ) friends,
  (select max(year) from t.friends) max_year
from your_table t

if applied to sample data in your question - output is