Home > other >  Reduce duplicate records like a DISTINCT , result set must be in correct order
Reduce duplicate records like a DISTINCT , result set must be in correct order

Time:07-04

The original data set looks like this (but it is a lot longer)

Id Bcolumn
123 5
124 5
125 6
126 7
127 7
128 8
129 8
130 8
131 8
132 5
133 5

I need to make something like a DISTINCT on this set, on Bcolumn, but the result set's order has to be the same order as the default set. AND the 5's have to be in the beginning of the set and the end of the set too. So basically the result would look like this.

Id Bcolumn
124 5
125 6
126 7
130 8
133 5

The Id column's values are not important in my case, so the result's first set is 124/5 but 123/5 would also work for me. I just added the ID column because I was thinking to use the Id column somehow in a fancy SELECT.

Originally I was thinking about using DISTINCT, but I realized that the result set would not include both records with 5 on the Bcolumn.

So currently my best bet would be something like a fetch-next/cursor, creating a temporary table, but that would be big and ugly, I was hoping that this can be done in nicer way.

Note - I am using MS SQL 2008, but I can upgrade the database if needed.

CodePudding user response:

Looking at your data it appears you might need a solution to identify sequential groups to include in your aggregation.

Does the following provide a usable solution?

with g as (
select *,  row_number() over(partition by b order by id) r, row_number() over(order by id) rn
from t
)
select max(id) Id, b
from g
group by b, rn-r
order by Id;

See example DB<>Fiddle

CodePudding user response:

Assuming the following table...

SELECT
    Id,
    Bcolumn
INTO
    #T
FROM
    (VALUES 
        (123, 5),
        (124, 5),
        (125, 6),
        (126, 7),
        (127, 7),
        (128, 8),
        (129, 8),
        (130, 8),
        (131, 8),
        (132, 5),
        (133, 5)
    ) T(Id, Bcolumn);

...the following query...

SELECT
    *
FROM
    #T T_CURRENT
WHERE
    Bcolumn NOT IN (
        SELECT TOP 1
            Bcolumn
        FROM
            #T T_NEXT
        WHERE
            T_CURRENT.Id < T_NEXT.Id
        ORDER BY
            T_NEXT.Id
    )
ORDER BY
    Id;

...gives the following result:

Id Bcolumn
124 5
125 6
127 7
131 8
133 5

The Ids are not exactly the same as in your example, but I think you were aiming at something like this.

CodePudding user response:

It is a little hard to tell exactly what you want, but this produces the results you specify:

select bcolumn, min(id)
from t
group by bcolumn
union all
select 5, max(id)
from t
where bcolumn = 5;
  • Related