Count number of records that come after a specific record with different settings-CodePudding

I am looking to create a column that represents how many records come after each record that have different 'settings', or in this case, that don't have the exact same column value 'p' in the test case below:

p	orderLevel
Jack	.03
Jack / Jill	.03.01
Jack / Jill / Bob	.03.01.01
------	.03.01.05
------	.03.01.06
Jack / Jill / Robert	.03.01.11
------	.03.01.15
Julie / Josh	.04.02
Julie / Josh / Tom	.04.02.01
Julie / Fred	.04.03
------	.04.03.06
------	.04.03.07

With a result table looking something like:

p	orderLevel	numAfter
Jack	.03	11
Jack / Jill	.03.01	10
Jack / Jill / Bob	.03.01.01	9
------	.03.01.05	4
------	.03.01.06	4
Jack / Jill / Robert	.03.01.11	6
------	.03.01.15	3
Julie / Josh	.04.02	4
Julie / Josh / Tom	.04.02.01	3
Julie / Fred	.04.03	2
------	.04.03.06	0
------	.04.03.07	0

where we see that the numAfter column represents how many records there are with a larger 'orderLevel' string and a different 'p' for each record in the table. The reason for doing this is now I can group by 'p' and 'numAfter' to only select one separator record 'p = ------ ' when it is the case that there are multiple following one another. I'm sure there are also other ways to accomplish the goal of only selecting one separator record if there are multiple following each other, so feel free to share those methods as well. But, I am hoping someone can provide the method of creating the 'numAfter' column as I have specified. I don't think it should be too difficult.

See below for the temp table and the results I have so far.

if object_id('tempdb..#t1') is not null drop table #t1
CREATE TABLE #t1 (p varchar(150), orderLevel varchar(150))
INSERT into #t1 VALUES 
   ('Jack', '.03'),
   ('Jack / Jill', '.03.01'),
   ('Jack / Jill / Bob', '.03.01.01'),
   ('------', '.03.01.05'),
   ('------', '.03.01.06'),
   ('Jack / Jill / Robert', '.03.01.11'),
   ('------', '.03.01.15'),
   ('Julie / Josh', '.04.02'),
   ('Julie / Josh / Tom', '.04.02.01'),
   ('Julie / Fred', '.04.03'),
   ('------', '.04.03.06'),
   ('------', '.04.03.07');



     select  t1.p,
             t1.orderLevel,
             count(t2.p) as numAfter
        from #t1 t1
        inner join #t1 t2
            on t2.orderLevel > t1.orderLevel
            and t1.p != t2.p
        group by t1.p, t1.orderLevel

CodePudding user response：

You need to use a left join instead of inner join. The left join will join the unmatched rows from the left table with a null values ensuring that all the left table rows are selected.

select T.p, T.orderLevel, count(D.p)
from #t1 T left join #t1 D
on T.p <> D.p and
   T.orderLevel < D.orderLevel
group by T.p, T.orderLevel 
order by T.orderLevel

As I understood from your question, you want to create groups for similar consecutive P values, if so, another simple option is to use a difference between two row_numbers as the following:

select *,
   row_number() over (order by orderLevel) -
   row_number() over (partition by p order by orderLevel) grp
from #t1
order by orderLevel

See demo for both queries.

For more details about this problem you may try to google 'gaps and islands problem in SQL'.

CodePudding user response：

Depending on which version of SQL Server you are running, there are various ways to access the previous row for this inspection.

Here is an example using an ordered row index (you should already have this as an identity on your actual table):

with cteOrdered as (
    select *, row_number() over(order by orderLevel) as [r]
    from #t1
)
select a.p, a.orderLevel
from cteOrdered a
join cteOrdered b on a.r-1 = b.r
-- remove any separator rows where previous row is also a separator
where iif(a.p=b.p and b.p='------', 1, 0) = 0;

More recent versions of SQL Server have the LAG() function which will work as well.