I have customers, jobs, orders and dates as columns. Customers can have multiple orders per job and of course multiple jobs given a certain date range.
What I am trying to do is figure out how many "new" orders there are within a certain date range. Meaning, if a job has 5 orders, it should only return 1 order.
Here's a short example of a database table, desired result, and the query I have been trying to work with thus far.
------- ------- --------- ------------
| CusID | JobID | OrderNo | OrderDate |
------- ------- --------- ------------
| 1 | 10 | 25 | 2021-12-22 |
| 1 | 10 | 26 | 2022-02-09 |
| 3 | 5 | 28 | 2022-01-10 |
| 3 | 6 | 29 | 2022-01-11 |
------- ------- --------- ------------
There are 4 rows of orders with their associated JobID
. The query should return 2 rows since order 25 and order 26 are both associated with JobID
10, only 1 can be counted as "new", while OrderNo
25 falls outside of the outer queries date range.
Here is the query I am trying to use, based on the date being >=
to the first of the year. I want to be able to use the OrderDate
from the outer query, to compare with the OrderDate
from the inner query. If there is a JobID
with an OrderDate
that is less than the current rows OrderDate
that order is NOT new and should return the JobID
, thus making the inner query false and filtering out the row.
SELECT * FROM orders
WHERE OrderDate >= '2022-01-01'
AND JobID NOT IN
(SELECT JobID FROM orders WHERE inner.OrderDate < outer.OrderDate)
Expected result
| 3 | 5 | 28 | 2022-01-10 |
| 3 | 6 | 29 | 2022-01-11 |
CodePudding user response:
Your query is heading in the right direction you just need to apply the alias to the outer table so it is uniquely identifiable -
SELECT *
FROM orders `outer`
WHERE OrderDate >= '2022-01-01'
AND JobID NOT IN
(SELECT JobID FROM orders WHERE OrderDate < `outer`.OrderDate);
Depending on the number of rows included in your time range, and the average number of orders per job, you may want to aggregate first -
SELECT *
FROM (
SELECT JobID, MIN(OrderDate) AS MinOrderDate
FROM orders
WHERE OrderDate >= '2022-01-01'
GROUP BY JobID
) o
WHERE JobID NOT IN
(SELECT JobID FROM orders WHERE OrderDate < o.MinOrderDate);
Assuming it is possible for a job to have multiple orders on one date, you might want to consider using OrderNo instead of OrderDate for your comparison -
SELECT *
FROM orders `outer`
WHERE OrderDate >= '2022-01-01'
AND JobID NOT IN
(SELECT JobID FROM orders WHERE OrderNo < `outer`.OrderNo);
CodePudding user response:
Use row_number
to rank the orders for each job. Put that in a subquery and only accept the newest oldest? order for each job.
select custid, jobid, orderNo, orderDate
from (
select
*,
row_number() over ( partition by jobid order by orderDate asc ) as oldest
from orders
) a
where a.oldest = 1
and orderDate >= '2022-01-01'