Home > Software design >  Select records that do not have at least one child element
Select records that do not have at least one child element

Time:11-13

How can I make an SQL query to select records that do not have at least one child element?

I have 3 tables: article (~40K rows), calendar (~450K rows) and calendar_cost (~500K rows).

It is necessary to select such entries of the article table:

  1. there are no entries in the calendar table,
  2. if there are entries in the calendar table, then all of them should not have any entries in the calendar_cost table.
create table article (
    id int PRIMARY KEY,
    name varchar
);

create table calendar (
    id int PRIMARY KEY,
    article_id int REFERENCES article (id) ON DELETE CASCADE,
    number varchar
);

create table calendar_cost (
    id int PRIMARY KEY,
    calendar_id int REFERENCES calendar (id) ON DELETE CASCADE,
    cost_value numeric
);

insert into article (id, name) values
(1, 'Article 1'),
(2, 'Article 2'),
(3, 'Article 3');

insert into calendar (id, article_id, number) values
(101, 1, 'Point 1-1'),
(102, 1, 'Point 1-2'),
(103, 2, 'Point 2');

insert into calendar_cost (id, calendar_id, cost_value) values
(400, 101, 100.123),
(401, 101, 400.567);

As a result, "Article 2" (condition 2) and "Article 3" (condition 1) will suit us.

My SQL query is very slow (the second condition part), how can I do it optimally? Is it possible to do without "union all" operator?

-- First condition
select a.id from article a 
left join calendar c on a.id = c.article_id
where c.id is null

union all 

-- Second condition
select a.id from article a
where id not in(
    select aa.id from article aa
    join calendar c on aa.id = c.article_id
    join calendar_cost cost on c.id = cost.calendar_id
    where aa.id = a.id limit 1
)

CodePudding user response:

Your second condition should start just like your first one: find all the calendar entries without calendar cost and only afterwards join it to article.

select a.id
from article a
  Inner Join (
    Select article_id
    From calendar c left join calendar_cost cc
      On c.id=cc.calendar_id
    Where cc.calendar_id is null
) cnone
  On a.id = cnone.article_id

This approach is based on the thought that calendar entries without calendar_cost is relatively rare compared to all the calendar entries.

CodePudding user response:

Your query is not valid as IN clauses don't support LIMIT

Adding some indexes on article_id and calender_id

Will help the performance

As you can see in the query plan

create table article (
    id int PRIMARY KEY,
    name varchar(100)
);

create table calendar (
    id int PRIMARY KEY,
    article_id int REFERENCES article (id) ON DELETE CASCADE,
    number varchar(100)
  ,index(article_id)
);

create table calendar_cost (
    id int PRIMARY KEY,
    calendar_id int REFERENCES calendar (id) ON DELETE CASCADE,
    cost_value numeric
  ,INDEX(calendar_id)
);

insert into article (id, name) values
(1, 'Article 1'),
(2, 'Article 2'),
(3, 'Article 3');

insert into calendar (id, article_id, number) values
(101, 1, 'Point 1-1'),
(102, 1, 'Point 1-2'),
(103, 2, 'Point 2');

insert into calendar_cost (id, calendar_id, cost_value) values
(400, 101, 100.123),
(401, 101, 400.567);

Records: 3  Duplicates: 0  Warnings: 0
Records: 3  Duplicates: 0  Warnings: 0
Records: 2  Duplicates: 0  Warnings: 2
select a.id from article a 
left join calendar c on a.id = c.article_id
where c.id is null

id
3
-- First condition
EXPLAIN
  select a.id from article a 
left join calendar c on a.id = c.article_id
where c.id is null

union all 

-- Second condition
select a.id from article a
JOIN (
    select aa.id from article aa
    join calendar c on aa.id = c.article_id
    join calendar_cost cost on c.id = cost.calendar_id

  LIMIT 1

) t1 ON t1.id <> a.id

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY a null index null PRIMARY 4 null 3 100.00 Using index
1 PRIMARY c null ref article_id article_id 5 fiddle.a.id 3 33.33 Using where; Not exists; Using index
2 UNION <derived3> null system null null null null 1 100.00 null
2 UNION a null index null PRIMARY 4 null 3 66.67 Using where; Using index
3 DERIVED cost null index calendar_id calendar_id 5 null 2 100.00 Using where; Using index
3 DERIVED c null eq_ref PRIMARY,article_id PRIMARY 4 fiddle.cost.calendar_id 1 100.00 Using where
3 DERIVED aa null eq_ref PRIMARY PRIMARY 4 fiddle.c.article_id 1 100.00 Using index

fiddle

CodePudding user response:

Try the following using a combination of exists criteria.

Usually, with supporting indexes, this is more performant than simply joining tables as it offers a short-circuit to get out as soon as a match is found, where as joining typically filters after all rows are joined.

select a.id 
from article a 
where not exists (
  select * from calendar c 
  where c.article_id =  a.id
)
or (exists (
  select * from calendar c 
  where c.article_id = a.id
)
  and not exists (
    select * from calendar_cost cc
    where cc.calendar_id in (select id from calendar c where c.article_id = a.id)
  )
);

CodePudding user response:

No need to split the conditions: The only condition you need to check for is that there are no calendar_cost rows whatsoever, which is the case if there are no calendar rows.

The trick is to use outer joins, which still return the parent table but have all null values when there is no join. Further, count() does not count null values, so requiring that the count of calendar_cost is zero is all you need.

select a.id
from article a
left join calendar c on c.article_id = a.id
left join calendar_cost cost on cost.calendar_id = c.id
group by a.id
having count(cost.calendar_id) = 0

See live demo.

If there are indexes on the id columns (the usual case), this query will perform quite well given the small table sizes.

  • Related