Hey this is my first Stack Overflow Question so if I have not asked it in the best way please let me know and I can add more info.
Context:
I have a sql table with Four Columns Called:
ACC_ID, BLOCK_CATEGORY, START_DATE, VALUE
The Columns are: (Number), (Number), (Date), (Number)
The table records all changes made to an account as a new row and the ACC_ID is a unique columns associated to the account. The START_DATE is the date the change was made. For this we can ignore the Value Column.
I have a requirement to run a query to understand when all the accounts changed to the current BLOCK_CATEGORY that they are on. The problem I am facing is that the Block Category is a number 1-8 and they may have been on the same BLOCK_CATEGORY before but we need to know when it changed to it this time?
Here is some sample Data to help you Understand (For my sample the Date Format is DD/MM/YYYY):
ACC_ID | BLOCK_CATEGORY | START_DATE | Value |
---|---|---|---|
1001 | 7 | 14/08/2022 | 5 |
1001 | 2 | 16/08/2022 | 5 |
1001 | 7 | 17/08/2022 | 10 |
1001 | 7 | 19/08/2022 | 10 |
1002 | 4 | 14/08/2022 | 3 |
1002 | 3 | 15/08/2022 | 3 |
1002 | 3 | 17/08/2022 | 9 |
1003 | 1 | 14/08/2022 | 10 |
1003 | 1 | 17/08/2022 | 13 |
1004 | 3 | 14/08/2022 | 2 |
1005 | 7 | 14/08/2022 | 11 |
1005 | 2 | 16/08/2022 | 34 |
1005 | 3 | 19/08/2022 | 1 |
1005 | 7 | 21/08/2022 | 12 |
The Desired end result of this is:
ACC_ID | BLOCK_CATEGORY | START_DATE |
---|---|---|
1001 | 7 | 17/08/2022 |
1002 | 3 | 15/08/2022 |
1003 | 1 | 14/08/2022 |
1004 | 3 | 14/08/2022 |
1005 | 7 | 21/08/2022 |
I hope through the above example and question you understand the need. Please ask any questions you have.
The current query I am using gives me the below incorrect result:
ACC_ID | BLOCK_CATEGORY | START_DATE |
---|---|---|
1001 | 7 | 14/08/2022 |
1002 | 3 | 15/08/2022 |
1003 | 1 | 14/08/2022 |
1004 | 3 | 14/08/2022 |
1005 | 7 | 14/08/2022 |
Here is the Query I am using. How can we run a query to give the correct desired result which is when it changed to the current BLOCK_CATEGORY.
SELECT *
FROM (
SELECT ACC_ID,
BLOCK_CATEGORY,
START_DATE,
ROW_NUMBER() OVER (PARTITION BY acc_id order by start_date DESC) RowNum
FROM
(
SELECT ACC_ID,
BLOCK_CATEGORY,
MIN(START_DATE) 'START_DATE'
FROM [dbo].[ACCOUNTCHANGES]
WHERE
BLOCK_CATEGORY IS NOT NULL
GROUP BY ACC_ID,BLOCK_CATEGORY
) A
) B
WHERE B.RowNum = 1
CodePudding user response:
Just looking at your required data, the following provides your desired results and should work on most RDBMS (assuming SQL Server though) - does this work for you?
Note I omitted value
as it's not present in your desired results.
with bc as (
select distinct ACC_ID,
First_Value(BLOCK_CATEGORY) over(partition by ACC_ID order by START_DATE desc) bc,
Dense_Rank() over(partition by ACC_ID order by BLOCK_CATEGORY)
Dense_Rank() over(partition by ACC_ID order by BLOCK_CATEGORY desc) -1 cnt /* count of distinct categories */
from t
)
select t.ACC_ID, t.BLOCK_CATEGORY, Min(t.START_DATE)
from bc
join t on t.ACC_ID = bc.ACC_ID and t.ACC_ID = bc.ACC_ID
where bc.cnt = 1
or t.START_DATE >= (
select Max(START_DATE) from t t2
where t2.ACC_ID = t.ACC_ID and t2.BLOCK_CATEGORY != t.BLOCK_CATEGORY
)
group by t.ACC_ID, t.BLOCK_CATEGORY
order by ACC_ID;
Working Demo Fiddle
CodePudding user response:
You may try the following:
With Create_Groups AS
(
Select D.ACC_ID, D.BLOCK_CATEGORY, D.START_DATE, D.RN,
SUM(D.g_edge) Over (Partition By ACC_ID Order By START_DATE) As GRP
From
(
Select ACC_ID, BLOCK_CATEGORY, START_DATE,
Case When LAG(BLOCK_CATEGORY, 1, BLOCK_CATEGORY)
Over (Partition By ACC_ID Order By START_DATE) <> BLOCK_CATEGORY
Then 1 Else 0
End As g_edge,
ROW_NUMBER() Over (Partition By ACC_ID Order By START_DATE DESC) As RN
From ACCOUNTCHANGES
) D
)
Select T.ACC_ID, T.BLOCK_CATEGORY, D.Fisrt_GRP_Date As START_DATE
From Create_Groups T
Join (Select ACC_ID, GRP, MIN(START_DATE) AS Fisrt_GRP_Date
FRom Create_Groups
Group By ACC_ID, GRP) D
On T.ACC_ID = D.ACC_ID And T.GRP = D.GRP
Where T.RN = 1
Order By T.ACC_ID
See a demo from db<>fiddle.
The idea is to define groups for consecutive similar values of 'BLOCK_CATEGORY' for each 'ACC_ID', this is done in Create_Groups
CTE. Then find the minimum date for each defined group and join it to the last 'BLOCK_CATEGORY' entry for each 'ACC_ID'.