Home > Mobile >  eliminating a whole group which doesn't meet minimum date - SQL Spark
eliminating a whole group which doesn't meet minimum date - SQL Spark

Time:01-07

I have a table with order ID, Country, order date, product name and quantity. As you can see one unique order ID is composed of a few products/records but spread around different dates. I need my query to retrieve only records of an order that all of its records order date is later than 6/11/2022; so for example: I need the query to completely eliminate all of order 222 as at least one of its records is earlier than 6/11/2022, and the same goes for order 111 (at least one of its records is earlier than 6/11/2022). As you can see, only order 333 meets that criteria. I'm trying to group by order ID and country, and then eliminating the whole order records according to the criteria, the issue is it will just eliminate a specific record which is earlier than 6/11/2022 - but not the whole order records: code:

select order ID, order date, product, quantity from Orders table
group by order ID, country
HAVING MIN(order date) > '6/11/2022'

Orders table:

order Id country order date product quantity
222 UK 05/11/2022 keyboard 2
222 UK 05/11/2022 motherboard 2
222 UK 07/11/2022 wireless mouse 1
111 Germany 08/11/2022 game console 5
111 Germany 05/10/2022 mini keyboard 3
111 Germany 08/10/2022 5 mini discs bundle 1
111 Germany 10/10/2022 backup disc 5
333 France 09/12/2022 backup disc 2
333 France 10/12/2022 backup disc 1

Query desired result:

order Id country order date product quantity
333 France 09/12/2022 backup disc 2
333 France 10/12/2022 backup disc 1

the results I'm getting - not the desired results:

order Id country order date product quantity
222 UK 07/11/2022 wireless mouse 1
111 Germany 08/11/2022 game console 5
333 France 09/12/2022 backup disc 2
333 France 10/12/2022 backup disc 1

CodePudding user response:

To eliminate a whole group which doesn't meet the minimum date in SQL Spark, you can use the MIN function and a WHERE clause in a SELECT statement.

For example, if you have a table called sales with a date column and a customer_id column, and you want to eliminate all customers whose earliest sale date is after a certain date, you could use the following query:

SELECT customer_id
FROM sales
GROUP BY customer_id
HAVING MIN(date) > '2022-01-01'

This will return a list of customer_id values for all customers whose earliest sale date is after January 1, 2022. You can then use this list to filter out these customers from the rest of your data.

For example, you could use a WHERE clause in a SELECT statement to exclude these customers from the results:

SELECT *
FROM sales
WHERE customer_id NOT IN (
  SELECT customer_id
  FROM sales
  GROUP BY customer_id
  HAVING MIN(date) > '2022-01-01'
)

CodePudding user response:

You can use window functions to compute the earliest date per order/country tuple on each row, then use that information to filter the dataset:

select *
from (
    select o.*,
        min(date) over(partition by order_id, country) min_date
    from orders o
) o
where min_date > date '2022-11-06'

This scans the table only once, hence it should be more efficient than the group by/join solution.

  • Related