Home > Software engineering >  Postgresql- Trying to get a list of results where a date range is null
Postgresql- Trying to get a list of results where a date range is null

Time:06-22

I'm working on converting results from a MySQL database to a Postgresql one, and finding random syntax issues that searching isn't helping with.

For this one: I am trying to return a list of results for users who have not created a job in our system for over 90 days. I've tried a few variations, but here is where my code is at right now:

 select  law_firms.name as "Law Firm"
from law_firms
left join jobs on jobs.law_firm_id = law_firms.id
  where jobs.id IN (SELECT jobs.created_at::date >= 90) IS NULL
  and law_firms.deleted_at IS NULL
  and law_firms.name not like '%Temp%"'
  and law_firms.name not like '%Individual%'
  GROUP BY law_firms.name

I've tried a few different variations of ::date between and CASTS, but it's coming back with an error message of "ERROR: operator does not exist: date >= integer Hint: No operator matches the given name and argument types. You might need to add explicit type casts"

For reference, the MySQL code that worked in our BI originally was:

where `jobs`.`id` IN (SELECT `jobs`.`created_at` BETWEEN NOW() - INTERVAL 90 DAY AND NOW()) IS NULL

CodePudding user response:

As per my comment, isn't this as simple as converting the bracketed subquery

from:

SELECT `jobs`.`created_at` BETWEEN NOW() - INTERVAL 90 DAY AND NOW()

To this:

SELECT `jobs`.`created_at` BETWEEN NOW() - interval '90 days' AND NOW()

edit: of course, you need to add WHERE and FROM to match the original subquery, but your example original subquery doesn't include it.

I think it would look like this:

SELECT `jobs`.`created_at` FROM `jobs` WHERE `jobs`.`created_at BETWEEN NOW() - INTERVAL 90 DAY AND NOW()

CodePudding user response:

Solving Strategy

  • Get the most recent jobs for every law firm
  • Filter out most recent jobs, that are younger than 90 days
  • Join with table law_firm

Most Recent Job per Law Firm

SELECT DISTINCT ON (law_firm_id) law_firm_id, created_at
FROM job
ORDER BY law_firm_id, created_at DESC
  • SELECT DISTINCT ON (law_firm_id)
    • Will return every law_firm_id once
    • It will select the first row for a law_firm_id in the order of ORDER BY
    • ORDER BY [...] created_at DESC will return the youngest job for the law_firm_id

Postgres Docu: https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT

Filter out Law Firms with Jobs younger than 90 days

Calculate the cutoff date in postgres with INTERVAL: NOW() - INTERVAL '90 days'

SELECT j.*
FROM (
   SELECT DISTINCT ON (law_firm_id) law_firm_id, created_at
   FROM job
   ORDER BY law_firm_id, created_at DESC
) as j
WHERE j.created_at < NOW() - INTERVAL '90 days'
  • NOW() - INTERVAL '90 days' - calculate cutoff timestamp
  • created_at assuming it's a timestamp or timestamp with time zone

Postgres Docu: https://www.postgresql.org/docs/current/functions-datetime.html

Final Query

SELECT law_firms.name as "Law Firm"
FROM law_firms
JOIN (
   SELECT DISTINCT ON (law_firm_id) law_firm_id, created_at
   FROM job
   ORDER BY law_firm_id, created_at DESC
) as j
ON j.law_firm_id = law_firms.id
WHERE j.created_at < NOW() - INTERVAL '90 days'
  AND law_firms.deleted_at IS NULL
  AND law_firms.name not like '%Temp%"'
  AND law_firms.name not like '%Individual%'
  • Query does not use IN, unfortunately the IN is not very fast in postgres (e.g. see: postgres query with IN is very slow)
  • I assumed you wanted all firm that had a job at least once. If you needed all law firms with zero jobs, the query would be wrong.
  • Related