Home > Software engineering >  Join onto the same table to fix incorrect data
Join onto the same table to fix incorrect data

Time:07-29

I am wanting to create a report that shows incorrect data and I am needing to find out a way to join on a table that has similar data but different identifiers. The identifiers need to be the same. So I'd like to use MAX() to get the highest number identifier and then update the records to reflect that.

So my data looks sort of like this:

ID        PRE   IDENTIFIER
123456789 ABC   1176109555
123456789 ABC   1430842555
123456789 ABC   1572290555
123456789 ABC   1651845555
123456789 ABC   2099393555
111111111 ABC   5555393555

So if an ID has more than one identifier I'd like for it t be returned like so:

IDENTIFIER CORRECTED IDENTIFIER
1176109555 2099393555
1430842555 2099393555
1572290555 2099393555
1651845555 2099393555

left column excludes the max() identifier while the right column would have that max () identifier for each other one listed. The 111111111 is not included because it does not have more than one identifier

CodePudding user response:

Use a CTE to obtain the max id per user:

    with max_identifier as (
    select id, max(identifier) as corrected_identifier
    from my_table
    group by id 
    )
    select t.id, t.pre, t.identifier, m.corrected_identifier
    from my_table t
    join max_identifier m
      on t.id = m.id 

Just realized you mentioned sql server. Someone may need to correct me on this syntax, sorry.

    with max_identifier (id, corrected_identifier) 
    as (
    select id, max(identifier) as corrected_identifier
    from my_table
    group by id 
    )
    select t.id, t.pre, t.identifier, m.corrected_identifier
    from my_table t
    join max_identifier m
      on t.id = m.id 

CodePudding user response:

Try the following:

Select * From 
(
   Select IDENTIFIER,
   Max(IDENTIFIER) Over (Partition By ID) as CORRECTED_IDENTIFIER
   From your_table
) D
Where D.IDENTIFIER<>CORRECTED_IDENTIFIER

The condition D.IDENTIFIER<>CORRECTED_IDENTIFIER in the Where clause will ensure that the max IDENTIFIER value is not returned and IDENTIFIER with no more than one value is not returned also.

If you want to update the original table, use the following CTE:

With CTE As
(
   Select * From 
  (
    Select IDENTIFIER,
    Max(IDENTIFIER) Over (Partition By ID) as CORRECTED_IDENTIFIER
    From your_table
  ) D
  Where D.IDENTIFIER<>CORRECTED_IDENTIFIER
)

Update CTE Set IDENTIFIER = CORRECTED_IDENTIFIER;

See a demo from db<>fiddle.

  • Related