Partition by consecutive dates-CodePudding

I have a table with two columns. X being the unique identifier. I want to get the row number when I partition by column Y only if Z is in consecutive order. For example, I have this table

   X    Y   Z 
   A    1   1-jan
   A    1   2-jan
   A    1   3-jan
   B    3   1-jan
   B    3   2-jan
   A    1   5-jan

The result should look like this:

   X    Y   Z      rn
   A    1   1-jan  1
   A    1   2-jan  2
   A    1   3-jan  3
   B    3   1-jan  1
   B    3   2-jan  2
   A    1   5-jan  1

The code I am using right now:

  select X, Y, Z, ROW_NUMBER() over (partition by Y order by Z) as rn

I am getting this as my result (This is not the result I want):

   X    Y   Z      rn
   A    1   1-jan  1
   A    1   2-jan  2
   A    1   3-jan  3
   B    3   5-jan  1
   B    3   6-jan  2
   A    1   5-jan  4  <---- Column Z is not 4-Jan therefore it should be the not be row 4. It should be a new row 1

CodePudding user response：

You first need to create data that can be used to partition your table.

The below uses LAG() to determine if a row is a "new partition", then SUM() OVER () to propagate that flag forward and make a "partition id", then finally uses ROW_NUMBER() with that identifier.

WITH
  gap_marker AS
(
  SELECT
    yourTable.*,
    IIF(
      LAG(z) OVER (PARTITION BY y ORDER BY z)
      =
      DATEADD(day, -1, z), 
      0,
      1
    )
      AS new_date_range
  FROM
    yourTable
), 
  date_range_partition AS
(
  SELECT
    gap_marker.*,
    SUM(new_date_range) OVER (PARTITION BY y ORDER BY z)   AS date_range_id
  FROM
    gap_marker
)
SELECT
  x, y, z,
  ROW_NUMBER() OVER (PARTITION BY y, date_range_id ORDER BY z)   AS rn
FROM
  date_range_partition

Alternatively, you could calculate an amount to deduct from the current rn, to reset to 1 when a date is skipped.

WITH
  enumerated AS
(
  SELECT
    yourTable.*,
    ROW_NUMBER() OVER (PARTITION BY y ORDER BY z)   AS rn,
    DATEDIFF(
      day,
      LAG(z) OVER (PARTITION BY y ORDER BY z),
      z
    )
      AS delta
  FROM
    yourTable
)
SELECT
  x, y, z,
  rn - MAX(IIF(delta = 1, 0, rn - 1)) OVER (PARTITION BY y ORDER BY z) AS rn
FROM
  enumerated

Finally, you could use DATEDIFF() if your rows are always whole days apart. Window functions can be used to work out what you should compare the current row against, and avoid ROW_NUMBER() altogether.

WITH
  check_previous AS
(
  SELECT
    yourTable.*,
    IIF(
      LAG(z) OVER (PARTITION BY y ORDER BY z)
      =
      DATEADD(day, -1, z), 
      NULL,
      z
    )
      AS new_base_date
  FROM
    yourTable
)
SELECT
  x, y, z,
  DATEDIFF(
    day,
    MAX(new_base_date) OVER (PARTITION BY y ORDER BY z),
    z
  )   1
    AS rn
FROM
  check_previous

Demo of all three; https://dbfiddle.uk/K8x8gOqh

CodePudding user response：

Supposing that column Z is a date column, you could try the following:

SELECT X, Y, Z,
  ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
  SELECT *,
    DATEDIFF(DAY, ROW_NUMBER() OVER (PARTITION BY X ORDER BY Z), Z) AS GRP
  FROM table_name
) T
ORDER BY X, Z

If the Z column datatype is not date, then you may generate the groups of consecutive values as the following:

SELECT X, Y, Z,
  ROW_NUMBER() OVER (PARTITION BY X, GRP ORDER BY Z) AS RN
FROM
(
  SELECT *,
    CAST(SUBSTRING(Z, 0, CHARINDEX('-', Z)) AS INT) - 
     ROW_NUMBER() OVER (PARTITION BY X ORDER BY Z) AS GRP
  FROM table_name2
) T
ORDER BY X, Z

See a demo.