I have a table with order ID, Country, order date, product name and quantity. As you can see one unique order ID is composed of a few products/records but spread around different dates. I need my query to retrieve only records of an order that all of its records order date is later than 6/11/2022; so for example: I need the query to completely eliminate all of order 222 as at least one of its records is earlier than 6/11/2022, and the same goes for order 111 (at least one of its records is earlier than 6/11/2022). As you can see, only order 333 meets that criteria. I'm trying to group by order ID and country, and then eliminating the whole order records according to the criteria, the issue is it will just eliminate a specific record which is earlier than 6/11/2022 - but not the whole order records: code:
select order ID, order date, product, quantity from Orders table
group by order ID, country
HAVING MIN(order date) > '6/11/2022'
Orders table:
order Id | country | order date | product | quantity |
---|---|---|---|---|
222 | UK | 05/11/2022 | keyboard | 2 |
222 | UK | 05/11/2022 | motherboard | 2 |
222 | UK | 07/11/2022 | wireless mouse | 1 |
111 | Germany | 08/11/2022 | game console | 5 |
111 | Germany | 05/10/2022 | mini keyboard | 3 |
111 | Germany | 08/10/2022 | 5 mini discs bundle | 1 |
111 | Germany | 10/10/2022 | backup disc | 5 |
333 | France | 09/12/2022 | backup disc | 2 |
333 | France | 10/12/2022 | backup disc | 1 |
Query desired result:
order Id | country | order date | product | quantity |
---|---|---|---|---|
333 | France | 09/12/2022 | backup disc | 2 |
333 | France | 10/12/2022 | backup disc | 1 |
the results I'm getting - not the desired results:
order Id | country | order date | product | quantity |
---|---|---|---|---|
222 | UK | 07/11/2022 | wireless mouse | 1 |
111 | Germany | 08/11/2022 | game console | 5 |
333 | France | 09/12/2022 | backup disc | 2 |
333 | France | 10/12/2022 | backup disc | 1 |
CodePudding user response:
To eliminate a whole group which doesn't meet the minimum date in SQL Spark, you can use the MIN function and a WHERE clause in a SELECT statement.
For example, if you have a table called sales with a date column and a customer_id column, and you want to eliminate all customers whose earliest sale date is after a certain date, you could use the following query:
SELECT customer_id
FROM sales
GROUP BY customer_id
HAVING MIN(date) > '2022-01-01'
This will return a list of customer_id values for all customers whose earliest sale date is after January 1, 2022. You can then use this list to filter out these customers from the rest of your data.
For example, you could use a WHERE clause in a SELECT statement to exclude these customers from the results:
SELECT *
FROM sales
WHERE customer_id NOT IN (
SELECT customer_id
FROM sales
GROUP BY customer_id
HAVING MIN(date) > '2022-01-01'
)
CodePudding user response:
You can use window functions to compute the earliest date per order/country tuple on each row, then use that information to filter the dataset:
select *
from (
select o.*,
min(date) over(partition by order_id, country) min_date
from orders o
) o
where min_date > date '2022-11-06'
This scans the table only once, hence it should be more efficient than the group by
/join
solution.