Postgres: counting records in two groups (existing foreign key or null)-CodePudding

I have a table items and a table batches. A batch can have n items associated by items.batch_id.

I'd like to write a query item counts in two groups batched and unbatched:

items WHERE batch_id IS NOT NULL (batched)
items WHERE batch_id IS NULL (unbatched)

The result should look like this

batched	unbatched
1200000	100

Any help appreciated, thank you!

CodePudding user response：

You can use COUNT with `FILTER( WHERE)

it is called conditional count

CREATE TABLE items(item_id int, batch_id int)

CREATE TABLE

INSERT INTO items VALUEs(1,NULL),(2,NULL),(3,1)

INSERT 0 3

CREATE tABLe batch (batch_id int)

CREATE TABLE

select 
       count(*) filter (WHERE batch_id IS NOT NULL ) as "matched"
  ,
         count(*) filter (WHERE batch_id IS NULL ) as "unmatched"
from items

matched	unmatched
1	2

SELECT 1

fiddle

CodePudding user response：

The count() function seems to be the most likely basic tool here. Given an expression, it returns a count of the number of rows where that expression evaluates to non-null. Given the argument *, it counts all rows in the group.

To the extent that there is a trick, it is getting the batched an unbatched counts in the same result row. There are at least three ways to do that:

Using subqueries:

select
  (select count(batch_id) from items) as batched,
  (select count(*) from items where batch_id is null) as unbatched
-- no FROM

That's pretty straightforward. Each subquery is executed and produces one column of the result. Because no FROM clause is given in the outer query, there will be exactly one result row.

Using window functions:

select
  count(batch_id) over () as batched,
  (count(*) over () - count(batch_id) over ()) as unbatched
from items
limit 1

That will compute the batched and unbatched results for the whole table on every result row, one per row of the items table, but then only one result row is actually returned. It is reasonable to hope (though you would definitely want to test) that postgres doesn't actually compute those counts for all the rows that are culled by the limit clause. You might, for example, compare the performance of this option with that of the previous option.

Using count() with a filter clause, as described in detail in another answer.