Home > OS >  Performing a JOIN with an ORDER BY and LIMIT in the joined table in SQL/Postgres
Performing a JOIN with an ORDER BY and LIMIT in the joined table in SQL/Postgres

Time:10-13

I have a table of companies and a table of transcripts that are related to those companies — many-to-one.

I want to be able to list companies that have not been checked recently AND do not have transcripts published recently.

The problem I'm running into is when I join the tables, I can't figure out how to only limit the second condition to the most recent transcript. My query is unintentionally returning companies that have recent transcripts if it detects a transcript that is older.

Company Table

id name last_checked_at
1 ACME 2022-10-11 02:50:52.184975 00
2 MeepMeep 2022-05-12 02:50:52.184975 00
3 TNT 2022-05-12 02:50:52.184975 00

Transcripts Table

id company published_at
5 1 2022-10-11 02:50:52.184975 00
6 2 2022-10-11 02:50:52.184975 00
7 2 2022-05-12 02:50:52.184975 00
8 3 2022-06-11 02:50:52.184975 00
9 3 2022-03-12 02:50:52.184975 00

Desired Logic

  • Select company.ids
  • Where company.last_checked_at is older than 1 week from today
  • And only the most recent related transcript.published_at is older than 3 months from today

Expected Behavior

Using the data in the tables above:

  • Result does not contain ACME since it was last_checked_at within 7 days of today
  • Result does not contain MeepMeep even though last_checked_at is greater than 7 days, since the most recent transcript was published_at within 3 months of today
  • Result DOES contain TNT since last_checked_at is greater than 7 days and the most recent transcript was published_at greater than 3 months of today

Attempts

SELECT * FROM summaries s LEFT OUTER JOIN companies c ON s.company = c.id WHERE s.published_at < now() - INTERVAL '3 months' ORDER BY s.published_at ASC limit 1

CodePudding user response:

We can try using DISTINCT ON on the transcripts table:

WITH cte AS (
    SELECT DISTINCT ON (company) *
    FROM summaries
    WHERE published_at < NOW() - INTERVAL '3 months'
    ORDER BY company, published_at DESC
)

SELECT c.name, c.last_checked_at, s.published_at
FROM companies c
INNER JOIN cte s
    ON s.company = c.id
WHERE c.last_checked_at < NOW() - INTERVAL '1 week';
  • Related