Home > OS >  Multiple aggregations for the same data
Multiple aggregations for the same data

Time:05-04

I have two queries:

SELECT 
    COUNT(warehouseCode) as count,
    warehouseCode
  FROM 'sparePartEventsTable'
  WHERE 
    sparePartCode = 'SP.0000' AND 
    sparePartConsumed = 'true' 
  GROUP BY warehouseCode
  ORDER BY count DESC

and

  SELECT 
    COUNT(*) as eventsCount,
    DATE(TIMESTAMP_SECONDS(epochSeconds)) as day
  FROM 'sparePartEventsTable'
  WHERE 
    sparePartCode = 'SP.0000' AND 
    sparePartConsumed = 'true' 
  GROUP BY day
  ORDER BY day  

As you can see the underlying data is the same, but I'm returning two different aggregations. Is there a way in SQL to avoid hitting the disk twice?

How would you implement this is either BigQuery or Postgres?

In mongodb I would build a cursor on the underlying common data, and then write an aggregation pipeline that spits two results.

EDIT: It seems that UNION ALL could be a first solution, but at least in postgres it scans twice. Sergey in the comment suggested GROUPING SETS, but unfortunately they are not available in BigQuery. An answer in either Postgres or BigQuery dialect will be accepted, extra points if both solutions are posted :)

CodePudding user response:

Consider below (BigQuery) option

with temp as (
  select 
    warehouseCode, 
    count(*) over(partition by warehouseCode) as `count`,
    date(timestamp_seconds(epochSeconds)) as day,
    count(*) over(partition by unix_date(date(timestamp_seconds(epochSeconds)))) as eventsCount
  from `sparePartEventsTable`
  where sparePartCode = 'SP.0000' 
  and sparePartConsumed = 'true' 
)
select distinct 'warehouseCode' type, '' || warehouseCode as value, `count` from temp union all
select distinct 'day', string(day), eventsCount from temp

with output like below

enter image description here

  • Related