Home > OS >  How to compute number of friends of friends in a BigQuery table with repeated records?
How to compute number of friends of friends in a BigQuery table with repeated records?

Time:02-22

I have a BigQuery table with the following format:

person friends.name friends.year
John Mary 1977
Mike 1984
Mary John 1980
Mike John 1977
Jane 1971

I want to compute, for each person, the maximum year in a separate column, and also for each friends record I would like to get the number of friends that each of the friends has (which would be achieved either with a self join, or with a window function).

I am not sure how to write this query, my approach so far has been:

SELECT person, 
   ARRAY(SELECT AS STRUCT f.name, f.year FROM UNNEST (Friends) f), 
   ARRAY_LENGTH(friends) AS number_friends
FROM table

However, this does not compute the number of friends for each array struct value. This is the output I am expecting:

person friends.name friends.year friends.num_friends max_year
John Mary 1977 1 1984
Mike 1984 2
Mary John 1980 2 1980
Mike John 1977 2 1977
Jane 1971 0

How can I write this query in an optimised way?

CodePudding user response:

Consider below approach

with friends_count as (
  select person, ifnull(num_friends, 0) num_friends from (
  select distinct name as person
  from your_table, unnest(friends)
  ) left join (
    select person, array_length(friends) num_friends
    from your_table
  ) using(person)
)
select person, array( 
    select as struct name, year, ifnull(num_friends, 0) num_friends
    from t.friends join friends_count on name = person
  ) friends,
  (select max(year) from t.friends) max_year
from your_table t    

if applied to sample data in your question - output is

enter image description here

  • Related