Row estimation for JOIN queries-CodePudding

How does PostgreSQL estimate the number of rows in JOIN query like:

EXPLAIN 
SELECT * 
FROM R, S 
WHERE (R.StartTime < S.EndTime) AND (S.StartTime < R.EndTime);

CodePudding user response：

Assuming that the data type involved is timestamp with time time zone (but it does not really matter, as we will see), the join selectivity estimation function can be found with:

SELECT oprjoin
FROM pg_operator
WHERE oprname = '<'
  AND oprleft = 'timestamptz'::regtype
  AND oprright = 'timestamptz'::regtype;

     oprjoin     
═════════════════
 scalarltjoinsel
(1 row)

That function is defined in src/backend/utils/adt/selfuncs.c:

/*
 *      scalarltjoinsel - Join selectivity of "<" for scalars
 */
Datum
scalarltjoinsel(PG_FUNCTION_ARGS)
{
    PG_RETURN_FLOAT8(DEFAULT_INEQ_SEL);
}

This is defined in src/include/utils/selfuncs.h as

/* default selectivity estimate for inequalities such as "A < b" */
#define DEFAULT_INEQ_SEL  0.3333333333333333

So, simple as it sounds, PostgreSQL will estimate that one inequality join condition will filter out two thirds of the rows. Since there are two such conditions, the selectivity is multiplied, and PostgreSQL will estimate that the row count of the result is

(#rows in R) * (#rows in S) / 9

As of yet, PostgreSQL does not have any cross-table statistics that make this less crude.

CodePudding user response：

There is a chapter in the manual addressing your question exactly:

Row Estimation Examples

With explanation for what Laurenz provided, among other things.

But that wasn't the full story, yet. We also need the number of rows (cardinalities) of underlying tables. Postgres uses estimate_rel_size() defined in src/backend/utils/adt/plancat.c:

 /*
  * estimate_rel_size - estimate # pages and # tuples in a table or index
  *
  * We also estimate the fraction of the pages that are marked all-visible in
  * the visibility map, for use in estimation of index-only scans.
  *
  * If attr_widths isn't NULL, it points to the zero-index entry of the
  * relation's attr_widths[] cache; we fill this in if we have need to compute
  * the attribute widths for estimation purposes.
  */
 void
 estimate_rel_size(Relation rel, int32 *attr_widths,
                   BlockNumber *pages, double *tuples, double *allvisfrac)
 ...

Here is a minimal SQL query to reproduce the calculation (ignoring some corner cases):

SELECT (reltuples / relpages * (pg_relation_size(oid) / 8192))::bigint
FROM   pg_class
WHERE  oid = 'mytable'::regclass;  -- your table here

More details:

Fast way to discover the row count of a table in PostgreSQL

Example

CREATE TEMP TABLE r(id serial, start_time timestamptz, end_time timestamptz);
CREATE TEMP TABLE s(id serial, start_time timestamptz, end_time timestamptz);

INSERT INTO r(start_time, end_time)
SELECT now(), now()  -- actual values don't matter for this particular case
FROM generate_series (1, 5000);

INSERT INTO s(start_time, end_time)
SELECT now(), now()
FROM generate_series (1, 10000);

VACUUM r, s;  -- set reltuples & relpages in pg_class

-- add 2000 rows to S
INSERT INTO s(start_time, end_time)
SELECT now(), now()
FROM generate_series (1, 2000);

pg_class still has 5000 and 10000 reltuples, but we know there are 5000 & 12000 rows in R and S. (Since these are temporary tables, they are not covered by autovacuum, so numbers are never updated automatically.) Check:

SELECT relname, reltuples, relpages  -- 5000 | 10000
FROM   pg_class c
WHERE  c.oid IN ('pg_temp.r'::regclass, 'pg_temp.s'::regclass);

SELECT count(*) FROM r; -- 5000
SELECT count(*) FROM s; -- 12000

Query plan:

EXPLAIN
SELECT *
FROM r, s
WHERE (r.start_time < s.end_time) AND (s.start_time < r.end_time);

'Nested Loop  (cost=0.00..1053004.31 rows=6683889 width=40)'
'  Join Filter: ((r.start_time < s.end_time) AND (s.start_time < r.end_time))'
'  ->  Seq Scan on s  (cost=0.00..197.31 rows=12031 width=20)'
'  ->  Materialize  (cost=0.00..107.00 rows=5000 width=20)'
'        ->  Seq Scan on r  (cost=0.00..82.00 rows=5000 width=20)'
'JIT:'
'  Functions: 6'
'  Options: Inlining true, Optimization true, Expressions true, Deforming true'

Postgres estimates rows=12031 for table s. A pretty good estimate, the algorithm worked.
The estimate is more easily thrown off by deleting rows, as the physical size of the table doesn't shrink automatically. It's a good idea to VACUUM ANALYZE after a major DELETE. Or even VACUUM FULL ANALYZE. See:

VACUUM returning disk space to operating system

Postgres expects rows=6683889, which matches our expectation (as per Laurenz' explanation):

SELECT 5000 * 12031 * 0.3333333333333333^2  -- 6683888.89

Better query

Your example query is just that: an example. But it happens to be a poor one, as the same can be achieved with range types and operators more efficiently. Specifically with tstzrange and &&:

Selectivity for &&?

SELECT oprjoin  -- areajoinsel
FROM pg_operator
WHERE oprname = '&&'
AND oprleft = 'anyrange'::regtype
AND oprright = 'anyrange'::regtype;

The source code in `src/backend/utils/adt/geoselfuncs.c:

 Datum
 areajoinsel(PG_FUNCTION_ARGS)
 {
     PG_RETURN_FLOAT8(0.005);
 }

Much more selective 0.005 << 0.333! And typically more realistic.

EXPLAIN
SELECT *
FROM r, s
WHERE tstzrange(r.start_time, r.end_time) && tstzrange(s.start_time, s.end_time);

Happens to be exactly equivalent, since tstzrange defaults to including the lower bound and excluding the upper bound. I get this query plan:

'Nested Loop  (cost=0.00..1203391.81 rows=300775 width=40)'
'  Join Filter: (tstzrange(r.start_time, r.end_time) && tstzrange(s.start_time, s.end_time))'
'  ->  Seq Scan on s  (cost=0.00..197.31 rows=12031 width=20)'
'  ->  Materialize  (cost=0.00..107.00 rows=5000 width=20)'
'        ->  Seq Scan on r  (cost=0.00..82.00 rows=5000 width=20)'
'JIT:'
'  Functions: 6'
'  Options: Inlining true, Optimization true, Expressions true, Deforming true'

Our expectation:

SELECT 5000 * 12031 * 0.005  -- 300775.000

It's a Bingo!
And this query can be supported with an index efficiently, changing the game ...