eliminating a whole group which doesn't meet minimum date

I have a table with order ID, Country, order date, product name and quantity. As you can see one unique order ID is composed of a few products/records but spread around different dates. I need my query to retrieve only records of an order that all of its records order date is later than 6/11/2022; so for example: I need the query to completely eliminate all of order 222 as at least one of its records is earlier than 6/11/2022, and the same goes for order 111 (at least one of its records is earlier than 6/11/2022). As you can see, only order 333 meets that criteria. I'm trying to group by order ID and country, and then eliminating the whole order records according to the criteria, the issue is it will just eliminate a specific record which is earlier than 6/11/2022 - but not the whole order records: code:

select order ID, order date, product, quantity from Orders table
group by order ID, country
HAVING MIN(order date) > '6/11/2022'

Orders table:

order Id	country	order date	product	quantity
222	UK	05/11/2022	keyboard	2
222	UK	05/11/2022	motherboard	2
222	UK	07/11/2022	wireless mouse	1
111	Germany	08/11/2022	game console	5
111	Germany	05/10/2022	mini keyboard	3
111	Germany	08/10/2022	5 mini discs bundle	1
111	Germany	10/10/2022	backup disc	5
333	France	09/12/2022	backup disc	2
333	France	10/12/2022	backup disc	1

Query desired result:

order Id	country	order date	product	quantity
333	France	09/12/2022	backup disc	2
333	France	10/12/2022	backup disc	1

the results I'm getting - not the desired results:

order Id	country	order date	product	quantity
222	UK	07/11/2022	wireless mouse	1
111	Germany	08/11/2022	game console	5
333	France	09/12/2022	backup disc	2
333	France	10/12/2022	backup disc	1

CodePudding user response：

To eliminate a whole group which doesn't meet the minimum date in SQL Spark, you can use the MIN function and a WHERE clause in a SELECT statement.

For example, if you have a table called sales with a date column and a customer_id column, and you want to eliminate all customers whose earliest sale date is after a certain date, you could use the following query:

SELECT customer_id
FROM sales
GROUP BY customer_id
HAVING MIN(date) > '2022-01-01'

This will return a list of customer_id values for all customers whose earliest sale date is after January 1, 2022. You can then use this list to filter out these customers from the rest of your data.

For example, you could use a WHERE clause in a SELECT statement to exclude these customers from the results:

SELECT *
FROM sales
WHERE customer_id NOT IN (
  SELECT customer_id
  FROM sales
  GROUP BY customer_id
  HAVING MIN(date) > '2022-01-01'
)

CodePudding user response：

You can use window functions to compute the earliest date per order/country tuple on each row, then use that information to filter the dataset:

select *
from (
    select o.*,
        min(date) over(partition by order_id, country) min_date
    from orders o
) o
where min_date > date '2022-11-06'

This scans the table only once, hence it should be more efficient than the group by/join solution.