Home > database >  Islands and Gaps algorithm does not produce a globally unique id for each island and gap
Islands and Gaps algorithm does not produce a globally unique id for each island and gap

Time:02-16

I am using a standard Islands and Gaps algorithm to find chunks of consecutive values (ones or zeroes). The column ProductionState represent periods of time with either production or no production based on readouts from sensors connected to a machine. The relevant step is contained in this Common Table Element:

-- Production state islands with unique Id
production_state_03( Timestamp, ProductionState, ProductionStateIslandId ) as
(
    select
        Timestamp,
        ProductionState,
        row_number() over ( order by Timestamp ) - row_number() over ( partition by ProductionState order by ProductionState )
    from production_state_02
)

Resulting in the following table:

enter image description here

The problem is that the ProductionStateIslandId for each island or gap is not necessarily globally unique, which is causing errors in the later steps of the analysis. Is there a different way to calculate Islands and Gaps that will always result in globally unique Id values?

CodePudding user response:

This thing: row_number() over ( partition by ProductionState order by ProductionState )

does not make sense. All it does is create a seem-be-be-ordered, in-reality-random number.

Your gaps are unusual, since they are not really gaps, the 0-valued rows still exist. Maybe a conditional summation would help:

row_number() over ( order by Timestamp ) - sum(ProductionState) over (order by Timestamp)

CodePudding user response:

The second row_number should also order by the timestamp.

  row_number() over (order by [Timestamp]) 
  - row_number() over (partition by ProductionState 
                       order by [Timestamp])

Or

  row_number() over (order by [Timestamp]) 
    row_number() over (partition by ProductionState 
                       order by [Timestamp] DESC)

But that correction won't make it globally unique.

Another way to calculate such ranking is by summing a flag for change.

production_state_03 ([Timestamp], ProductionState, ProductionStateIslandId) as
(
    select [Timestamp], ProductionState
    , rnk = SUM(flag) over (order by [Timestamp])
    from
    (
        select [Timestamp], ProductionState
        , flag = IIF(ProductionState = LAG(ProductionState) over (order by [Timestamp]), 0, 1)
        from production_state_02
    ) q
)

This Gaps-And-Islands solution trick does need an extra subquery, but the rank will be sequencial.

  • Related