I have a large dataset where we keep track of all revenue collected by client for each month. So each client gets a row for each month they were charged, resulting in a lot of rows. I am trying to find a way to pull the most recent record for each client.
Here is what i have:
|merchant name|Merchant id|revenue date|revenue amount|
----------------------------------------------------------
|fish| 1234| 2022-03-01| 200|
|fish| 1234| 2022-04-01| 200|
|fish| 1234| 2022-05-01| 200|
|fish| 1234| 2022-06-01| 200|
|dog| 5678| 2022-01-01| 200|
|dog| 5678| 2022-02-01| 200|
|dog| 5678| 2022-03-01| 200|
|dog| 5678| 2022-04-01| 200|
|cat| 1011| 2022-10-01| 200|
|cat| 1011| 2022-11-01| 200|
My desired result is:
|merchant name|Merchant id|revenue date|revenue amount|
----------------------------------------------------------
|fish| 1234| 2022-06-01| 200|
|dog| 5678| 2022-04-01| 200|
|cat| 1011| 2022-11-01| 200|
I have tried this:
Select distinct
merchant_name,
merchant_id,
revenue_date,
revenue_amount
from table
where revenue_date=(select max(revenue_date) from table)
but that is only returning rows that match the maximum date listed (2022-11-01).
Any help is much appreciated!
CodePudding user response:
The general solution takes the form:
select *
from (
select t.*, row_number() over(partition by merchant_name
order by revenue_date desc) as rn
from t
) x
where rn = 1
Or... in PostgreSQL you can just do:
select distinct on (merchant_name) *
from t
order by merchant_name, revenue_date desc
CodePudding user response:
Alternate answer (formatted for SQL Server):
--For each merchantID, find the latest date
WITH maxDates AS (
SELECT
merchantID,
MAX(revenueDate) revenueDate
FROM @input
GROUP BY merchantID
)
--Use the defined CTE to retrieve the rest of the columns
SELECT i.*
FROM @input i
INNER JOIN maxDates m ON i.merchantID=m.merchantID AND i.revenueDate=m.revenueDate