I want to improve the performance of these 2 queries:
select object into latitude
from data
where predicate = 'latitude'
and subject = ( select object
from data
where subject = (select object
from data
where subject = 'url1' and predicate = '#isLocatedAt')
and predicate = 'http://schema.org/geo');
select object into Longitude
from data
where predicate = 'longitude'
and subject = ( select object
from data
where subject = (select object
from data
where subject = 'url1' and predicate = '#isLocatedAt')
and predicate = 'http://schema.org/geo');
I don't have an index on data.object because it's a text and it's too big.
Explain analyse (for 1 requête):
QUERY PLAN
Index Scan using subjectetpredicate on data (cost=25.19..36.76 rows=3 width=63) (actual time=34.299..34.303 rows=1 loops=1)
Index Cond: (((subject)::text = $1) AND ((predicate)::text = 'latitude'::text))
InitPlan 2 (returns $1)
-> Index Scan using subjectetpredicate on data data_2 (cost=12.94..24.50 rows=3 width=63) (actual time=31.374..31.379 rows=1 loops=1)
Index Cond: (((subject)::text = $0) AND ((predicate)::text = 'geo'::text))
InitPlan 1 (returns $0)
-> Index Scan using subjectetpredicate on data data_1 (cost=0.69..12.25 rows=3 width=63) (actual time=0.329..0.332 rows=1 loops=1)
Index Cond: (((subject)::text = 'url1'::text) AND ((predicate)::text = '#isLocatedAt'::text))
Planning Time: 1.071 ms
Execution Time: 34.359 ms
CodePudding user response:
Your query plan shows that you're doing 6 index scans sequentially. Rewrite your script so that you have to do only 4, by storing the shared result in a temporary variable:
select object into geolocation
-- ^^^^^^^^^^^^^^^^
from data
where predicate = 'http://schema.org/geo'
and subject = (select object
from data
where predicate = '#isLocatedAt'
and subject = 'url1');
select object into latitude
from data
where predicate = 'latitude'
and subject = geolocation;
-- ^^^^^^^^^^^
select object into longitude
from data
where predicate = 'longitude'
and subject = geolocation;
-- ^^^^^^^^^^^
You can achieve the same in a single query (which doesn't necessarily make it any faster or easier to read) by using a CTE or simply by flipping your subqueries to get a result with multiple columns:
select
(
select object
from data
where predicate = 'longitude'
and subject = geolocation
), (
select object
from data
where predicate = 'latitude'
and subject = geolocation
)
into longitude, latitude
from (
select object as geolocation
from data
where predicate = 'http://schema.org/geo'
and subject = (select object
from data
where predicate = '#isLocatedAt'
and subject = 'url1')
) as temp;
CodePudding user response:
Instead of nested subqueries, we can rethink this as a series of self-joins and flip it on its head. I find this much easier to understand, and it's much faster (at least on this tiny dataset).
- url1, #isLocatedAt, y
- y, http://schema.org/geo, z
- z, (latitude, longitude)
To do it in a single query, filter it on where predicate in ('latitude', 'longitude')
.
select
l2.predicate, l2.object
from data l0
join data l1 on l0.object = l1.subject and l1.predicate = 'http://schema.org/geo'
join data l2 on l1.object = l2.subject and l2.predicate in ('latitude', 'longitude')
where l0.subject = 'url1' and l0.predicate = '#isLocatedAt'
This is a couple orders of magnitude faster, though it would have to be run against a realistic amount of data to matter.
QUERY PLAN
Nested Loop (cost=0.43..24.51 rows=1 width=548) (actual time=0.040..0.042 rows=2 loops=1)
-> Nested Loop (cost=0.29..16.34 rows=1 width=32) (actual time=0.022..0.023 rows=1 loops=1)
Join Filter: (l0.object = (l1.subject)::text)
-> Index Scan using subjectetpredicate on data l0 (cost=0.14..8.16 rows=1 width=32) (actual time=0.011..0.012 rows=1 loops=1)
Index Cond: (((subject)::text = 'url1'::text) AND ((predicate)::text = '#isLocatedAt'::text))
-> Index Scan using predicate on data l1 (cost=0.14..8.16 rows=1 width=548) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: ((predicate)::text = 'http://schema.org/geo'::text)
-> Index Scan using subjectetpredicate on data l2 (cost=0.14..8.16 rows=1 width=1064) (actual time=0.017..0.018 rows=2 loops=1)
Index Cond: ((subject)::text = l1.object)
Filter: ((predicate)::text = ANY ('{latitude,longitude}'::text[]))
Planning Time: 0.192 ms
Execution Time: 0.072 ms
This approach is also a step towards generalizing it as a recursive CTE.
Note that you don't need an index on subject
because you have an index on (subject, predicate)
.