I have a large query I am trying to analyze to improve the efficiency. I first ran just EXPLAIN
:
EXPLAIN
SELECT * FROM pop_allocation_sql.main_parcels_cb_intersects
WHERE parcel_id NOT IN (SELECT DISTINCT parcel_id::int
FROM pop_allocation_sql.main_bldgs_cb_intersects)
And it returned the following within seconds:
Gather (cost=20506897.97..3330358572517.40 rows=40283932 width=89)
Workers Planned: 7
-> Parallel Seq Scan on main_parcels_cb_intersects (cost=20505897.97..3330354543124.20 rows=5754847 width=89)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=20505897.97..21069329.24 rows=6107669 width=4)
-> Unique (cost=20505897.97..21014931.89 rows=6107669 width=4)
-> Sort (cost=20505897.97..20760414.93 rows=101806784 width=4)
Sort Key: ((main_bldgs_cb_intersects.parcel_id)::integer)
-> Seq Scan on main_bldgs_cb_intersects (cost=0.00..5334454.80 rows=101806784 width=4)
But this would not show me where the exact bottlenecks are occurring so I tried to run just:
EXPLAIN ANALYZE
SELECT * FROM pop_allocation_sql.main_parcels_cb_intersects
WHERE parcel_id NOT IN (SELECT DISTINCT parcel_id::int
FROM pop_allocation_sql.main_bldgs_cb_intersects)
I let this run for over an hour and nothing returned. I checked PG ADMIN and notcied the wait event said Lock: transactionid and gave a blocking PID of 119698 (I am not exactly sure what this means). Why won't my query complete?
CodePudding user response:
The session with process ID 119698 has a transaction open that holds an ACCESS EXCLUSIVE
lock on one of the tables in your query. End that transaction, if necessary with
SELECT pg_cancel_backend(119698);
If you want your query to perform better, rewrite it to
SELECT *
FROM pop_allocation_sql.main_parcels_cb_intersects AS p
WHERE NOT EXISTS (
SELECT FROM pop_allocation_sql.main_bldgs_cb_intersects AS pb
WHERE p.parcel_id = pb.parcel_id::integer
);
The DISTINCT
is quite unnecessary.