Home > Software design >  BigQuery SQL: Sum of first N related items
BigQuery SQL: Sum of first N related items

Time:04-30

I would like to know the sum of a value in the first n items in a related table. For example, I want to get the sum of a companies first 6 invoices (the invoices can be sorted by ID asc)

Current SQL:

SELECT invoices.company_id, SUM(invoices.amount)
FROM invoices
JOIN companies on invoices.company_id = companies.id
GROUP BY invoices.company_id

This seems simple but I can't wrap my head around it.

CodePudding user response:

You can create order row numbers to the lines in a partition based on invoice id and filter to it, something like this:

with array_table as (
 select 'a' field, * from unnest([3, 2, 1 ,4, 6, 3]) id
 union all
 select 'b' field, * from unnest([1, 2, 1, 7]) id
)

select field, sum(id) from (
   select field, id, row_number() over (partition by a.field order by id desc) rownum
   from array_table a
) 
where rownum < 3
group by field

More examples for analytical examples here:

https://medium.com/@aliz_ai/analytic-functions-in-google-bigquery-part-1-basics-745d97958fe2

https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts

CodePudding user response:

Consider also below approach

select company_id, (
    select sum(amount)
    from t.amounts amount
  ) as top_six_invoices_amount
from (
  select invoices.company_id, 
    array_agg(invoices.amount order by invoices.invoice_id limit 6) amounts
  from your_table invoices
  group by invoices.company_id
) t
  • Related