How can I improve multiple queries into one-CodePudding

I want to improve the performance of these 2 queries:

select object into latitude  
from data
where predicate = 'latitude' 
and subject = ( select object 
                from data 
                where subject = (select object 
                                 from data 
                                  where subject = 'url1' and predicate = '#isLocatedAt') 
and predicate = 'http://schema.org/geo');

select object into Longitude  
from data 
where predicate = 'longitude' 
and subject = ( select object 
                from data 
                where subject = (select object 
                                 from data 
                                 where subject = 'url1' and predicate = '#isLocatedAt') 
and predicate = 'http://schema.org/geo');

fiddle

I don't have an index on data.object because it's a text and it's too big.

Explain analyse (for 1 requête):

QUERY PLAN
Index Scan using subjectetpredicate on data  (cost=25.19..36.76 rows=3 width=63) (actual time=34.299..34.303 rows=1 loops=1)
  Index Cond: (((subject)::text = $1) AND ((predicate)::text = 'latitude'::text))
  InitPlan 2 (returns $1)
    ->  Index Scan using subjectetpredicate on data data_2  (cost=12.94..24.50 rows=3 width=63) (actual time=31.374..31.379 rows=1 loops=1)
          Index Cond: (((subject)::text = $0) AND ((predicate)::text = 'geo'::text))
          InitPlan 1 (returns $0)
            ->  Index Scan using subjectetpredicate on data data_1  (cost=0.69..12.25 rows=3 width=63) (actual time=0.329..0.332 rows=1 loops=1)
                  Index Cond: (((subject)::text = 'url1'::text) AND ((predicate)::text = '#isLocatedAt'::text))
Planning Time: 1.071 ms
Execution Time: 34.359 ms

CodePudding user response：

Your query plan shows that you're doing 6 index scans sequentially. Rewrite your script so that you have to do only 4, by storing the shared result in a temporary variable:

select object into geolocation
--            ^^^^^^^^^^^^^^^^
from data
where predicate = 'http://schema.org/geo'
  and subject = (select object 
                 from data 
                 where predicate = '#isLocatedAt'
                   and subject = 'url1');

select object into latitude  
from data
where predicate = 'latitude' 
  and subject = geolocation;
--              ^^^^^^^^^^^

select object into longitude  
from data 
where predicate = 'longitude' 
  and subject = geolocation;
--              ^^^^^^^^^^^

You can achieve the same in a single query (which doesn't necessarily make it any faster or easier to read) by using a CTE or simply by flipping your subqueries to get a result with multiple columns:

select
  (
    select object  
    from data
    where predicate = 'longitude'
      and subject = geolocation
  ), (
    select object
    from data
    where predicate = 'latitude'
      and subject = geolocation
  )
  into longitude, latitude
from (
  select object as geolocation
  from data
  where predicate = 'http://schema.org/geo'
    and subject = (select object 
                   from data 
                   where predicate = '#isLocatedAt'
                     and subject = 'url1')
) as temp;

CodePudding user response：

Instead of nested subqueries, we can rethink this as a series of self-joins and flip it on its head. I find this much easier to understand, and it's much faster (at least on this tiny dataset).

url1, #isLocatedAt, y
y, http://schema.org/geo, z
z, (latitude, longitude)

To do it in a single query, filter it on where predicate in ('latitude', 'longitude').

select
  l2.predicate, l2.object
from data l0
join data l1 on l0.object = l1.subject and l1.predicate  = 'http://schema.org/geo'
join data l2 on l1.object = l2.subject and l2.predicate in ('latitude', 'longitude')
where l0.subject = 'url1' and l0.predicate = '#isLocatedAt'

This is a couple orders of magnitude faster, though it would have to be run against a realistic amount of data to matter.

QUERY PLAN
Nested Loop  (cost=0.43..24.51 rows=1 width=548) (actual time=0.040..0.042 rows=2 loops=1)
  ->  Nested Loop  (cost=0.29..16.34 rows=1 width=32) (actual time=0.022..0.023 rows=1 loops=1)
        Join Filter: (l0.object = (l1.subject)::text)
        ->  Index Scan using subjectetpredicate on data l0  (cost=0.14..8.16 rows=1 width=32) (actual time=0.011..0.012 rows=1 loops=1)
              Index Cond: (((subject)::text = 'url1'::text) AND ((predicate)::text = '#isLocatedAt'::text))
        ->  Index Scan using predicate on data l1  (cost=0.14..8.16 rows=1 width=548) (actual time=0.007..0.008 rows=1 loops=1)
              Index Cond: ((predicate)::text = 'http://schema.org/geo'::text)
  ->  Index Scan using subjectetpredicate on data l2  (cost=0.14..8.16 rows=1 width=1064) (actual time=0.017..0.018 rows=2 loops=1)
        Index Cond: ((subject)::text = l1.object)
        Filter: ((predicate)::text = ANY ('{latitude,longitude}'::text[]))
Planning Time: 0.192 ms
Execution Time: 0.072 ms

Demonstration.

This approach is also a step towards generalizing it as a recursive CTE.

Note that you don't need an index on subject because you have an index on (subject, predicate).