I have a table with two columns. X being the unique identifier. I want to get the row number when I partition by column Y only if Z is in consecutive order. For example, I have this table
X Y Z
A 1 1-jan
A 1 2-jan
A 1 3-jan
B 3 1-jan
B 3 2-jan
A 1 5-jan
The result should look like this:
X Y Z rn
A 1 1-jan 1
A 1 2-jan 2
A 1 3-jan 3
B 3 1-jan 1
B 3 2-jan 2
A 1 5-jan 1
The code I am using right now:
select X, Y, Z, ROW_NUMBER() over (partition by Y order by Z) as rn
I am getting this as my result (This is not the result I want):
X Y Z rn
A 1 1-jan 1
A 1 2-jan 2
A 1 3-jan 3
B 3 5-jan 1
B 3 6-jan 2
A 1 5-jan 4 <---- Column Z is not 4-Jan therefore it should be the not be row 4. It should be a new row 1
CodePudding user response:
You first need to create data that can be used to partition your table.
The below uses LAG() to determine if a row is a "new partition", then SUM() OVER () to propagate that flag forward and make a "partition id", then finally uses ROW_NUMBER() with that identifier.
WITH
gap_marker AS
(
SELECT
yourTable.*,
IIF(
LAG(z) OVER (PARTITION BY y ORDER BY z)
=
DATEADD(day, -1, z),
0,
1
)
AS new_date_range
FROM
yourTable
),
date_range_partition AS
(
SELECT
gap_marker.*,
SUM(new_date_range) OVER (PARTITION BY y ORDER BY z) AS date_range_id
FROM
gap_marker
)
SELECT
x, y, z,
ROW_NUMBER() OVER (PARTITION BY y, date_range_id ORDER BY z) AS rn
FROM
date_range_partition
Alternatively, you could calculate an amount to deduct from the current rn
, to reset to 1
when a date is skipped.
WITH
enumerated AS
(
SELECT
yourTable.*,
ROW_NUMBER() OVER (PARTITION BY y ORDER BY z) AS rn,
DATEDIFF(
day,
LAG(z) OVER (PARTITION BY y ORDER BY z),
z
)
AS delta
FROM
yourTable
)
SELECT
x, y, z,
rn - MAX(IIF(delta = 1, 0, rn - 1)) OVER (PARTITION BY y ORDER BY z) AS rn
FROM
enumerated
Finally, you could use DATEDIFF() if your rows are always whole days apart. Window functions can be used to work out what you should compare the current row against, and avoid ROW_NUMBER() altogether.
WITH
check_previous AS
(
SELECT
yourTable.*,
IIF(
LAG(z) OVER (PARTITION BY y ORDER BY z)
=
DATEADD(day, -1, z),
NULL,
z
)
AS new_base_date
FROM
yourTable
)
SELECT
x, y, z,
DATEDIFF(
day,
MAX(new_base_date) OVER (PARTITION BY y ORDER BY z),
z
) 1
AS rn
FROM
check_previous
Demo of all three; https://dbfiddle.uk/K8x8gOqh
CodePudding user response:
Supposing that column Z is a date column, you could try the following:
SELECT X, Y, Z,
ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
SELECT *,
DATEDIFF(DAY, ROW_NUMBER() OVER (PARTITION BY X ORDER BY Z), Z) AS GRP
FROM table_name
) T
ORDER BY X, Z
If the Z column datatype is not date, then you may generate the groups of consecutive values as the following:
SELECT X, Y, Z,
ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
SELECT *,
CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT) -
ROW_NUMBER() OVER (PARTITION BY X ORDER BY Z) AS GRP
FROM table_name2
) T
ORDER BY X, Z
See a demo.