Home > Software engineering >  Get specific records in an ordered data by time but doesn't have primary key for association
Get specific records in an ordered data by time but doesn't have primary key for association

Time:01-19

Ok, the title isn't enough to explain the problem. I have the following data which gives the time and length of iron bars. There is no primary key that identifies each iron bar. However, the data is ordered by time and whenever the length is zero we know that the values after it belong to the same iron bar. The time is when the measurement is recorded, and length is the measurement. Here is the script to create the data:

DECLARE @data TABLE ([time] DATETIME, [length] FLOAT)

INSERT INTO @data
VALUES ('2023-01-15 01:00:10', 0), ('2023-01-15 02:23:15', 0), ('2023-01-15 02:23:16', 0.3334), ('2023-01-15 02:23:23', 1.11), ('2023-01-15 02:23:50', 2.225), ('2023-01-15 02:24:00', 5), ('2023-01-15 02:24:13', 5.334), ('2023-01-15 02:24:23', 5.9), ('2023-01-15 02:24:35', 6.125), ('2023-01-15 02:25:30', 6.334), ('2023-01-15 02:26:33', 7.996), ('2023-01-15 02:27:31', 10.3), ('2023-01-15 02:27:32', 0), ('2023-01-15 02:28:28', 0), ('2023-01-15 02:29:22', 0), ('2023-01-15 02:30:14', 0.66669), ('2023-01-15 02:30:15', 1.669), ('2023-01-15 02:30:25', 2.334), ('2023-01-15 02:30:30', 3), ('2023-01-15 02:30:35', 5.669), ('2023-01-15 02:30:56', 6.669), ('2023-01-15 02:31:15', 7.332), ('2023-01-15 02:31:30', 8), ('2023-01-15 02:31:40', 9.5), ('2023-01-15 02:31:44', 9.9), ('2023-01-15 02:31:55', 10), ('2023-01-15 02:32:15', 10.4), ('2023-01-15 02:32:20', 0), ('2023-01-15 02:32:30', 0.5), ('2023-01-15 02:32:44', 1), ('2023-01-15 02:32:54', 1.334)

What I'm trying to get from this data are the records when the length becomes 0, 5, 7 and 10. Usually, the length is not a whole number and has a fraction. In that case I'll take the first length close to the whole number. For example, If I have the following lengths: 4.9, 5.33, 5.5, 5.7, then I'll take the value 5.33 because it is the least number higher than 5. The same applies for the remaining lengths.

Number 0 indicates a new Iron bar, but it also can be recorded more than once for the same Bar. Here is a screenshot of the data above: 3 iron bars with lengths And this is the data I'm trying to get: enter image description here

I tried the following but this leads to data to get scrambled. Just added this since stackoverflow requests it but obviously not what I'm looking for.

SELECT MIN([time]), FLOOR([length])
FROM @data
WHERE FLOOR([length]) IN (0, 5, 7, 10)
GROUP BY FLOOR([length])

Thanks.

CodePudding user response:

Cool question, appeared much more complicated then it seems to be :)

;with candidates as (
    select 
        *
    from @data 
    where floor([length]) in (0, 5, 7, 10)
)
, candidatesWithPrevLength as (
    select 
        *
        , lag([length]) over (order by [time]) as lengthPrevious
    from candidates     
), candidatesWithGroupStarts as (
    select
        *
        , case when [length] < lengthPrevious or lengthPrevious is null then 1 else null end as 'groupStart'
    from candidatesWithPrevLength
), groups as (
    select
        [time]
        , lead([time]) over (order by [time]) as validUntil
        , row_number() over (order by [time]) as groupId
    from candidatesWithGroupStarts  
    where groupStart = 1
), candidatesInGroups as (
    select 
        c.* 
        , g.groupId
    from candidates c, groups g
    where c.[time] >= g.[time] and (c.[time] < g.validUntil or g.validUntil is null)
), candidatesInGroupsWithOrders as (
    select 
        * 
        , row_number() over (partition by groupId, floor([length]) order by [time]) as OrderNo
    from candidatesInGroups
)
select [time], [length] from candidatesInGroupsWithOrders where orderno = 1

Surprisingly, I had to use a bunch of SQL mechanisms to achieve this goal: CTE-s, join, window functions

Will try to describe the idea in brief:

  • find candidate bars (ones who match initial conditions)
  • calculate groups of candidates (based on 0-to-10 length checks)
  • split all candidates into groups
  • select needed bars from these groups

That's just a possible option. Definitely, a bit "dirty". Not sure how it will work with huge amounts of data. Definitely could be simplified and optimized. But it works and could be a good "starting point"

  • Related