Oracle SQL Uniquely Update Duplicate Records-CodePudding

I have a STUDENT table and need to update the STUDENT_ID values by prefixing with the letter SS followed by STUDENT_ID value. For any duplicate STUDENT_ID records, I should prefix the duplicate records as SS1 SS2. Below is an example

Before Update:

NUM	STUDENT_ID
1	9234
2	9234
3	9234
4	3456
5	3456
6	789
7	956

After Update:

NUM	STUDENT_ID
1	SS9234
2	SS19234
3	SS29234
4	SS3456
5	SS13456
6	SS789
7	SS956

Below is the query for updating the STUDENT_ID for unique records.

update student set student_id = 'SS'||student_id ;
commit;

Need suggestion for updating the STUDENT_ID for duplicate records. There are around 1 million duplicate records in the table and total volume is around 40 million. Appreciate for any inputs for performance enhancement.

CodePudding user response：

You can use a MERGE statement correlated on the ROWID pseudo-column and using the ROW_NUMBER() analytic function:

MERGE INTO table_name dst
USING (
  SELECT ROWID as rid,
         ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY num) AS rn
  FROM   table_name
) src
ON (src.rid = dst.ROWID)
WHEN MATCHED THEN
  UPDATE
  SET student_id = 'SS' || CASE WHEN rn > 1 THEN rn - 1 END || dst.student_id;

Which, for the sample data:

CREATE TABLE table_name (NUM, STUDENT_ID) AS
SELECT 1, CAST('9234' AS VARCHAR2(20)) FROM DUAL UNION ALL
SELECT 2, '9234' FROM DUAL UNION ALL
SELECT 3, '9234' FROM DUAL UNION ALL
SELECT 4, '3456' FROM DUAL UNION ALL
SELECT 5, '3456' FROM DUAL UNION ALL
SELECT 6, '789'  FROM DUAL UNION ALL
SELECT 7, '956'  FROM DUAL;

Then after the MERGE the table contains:

NUM	STUDENT_ID
1	SS9234
2	SS19234
3	SS29234
4	SS3456
5	SS13456
6	SS789
7	SS956

fiddle

CodePudding user response：

I'm sure there must be a better way, but this query can get the job done:

update t
set student_id = (
  select new_student_id
  from (
    select x.*, 'SS' || case when rn = 1 then '' else '' || rn end 
      || student_id as new_student_id
    from (
      select t.*, row_number() over(partition by student_id order by num) as rn
      from t
    ) x
  ) y
  where t.num = y.num
)

Result:

 NUM  STUDENT_ID 
 ---- ---------- 
 1    SS9234     
 2    SS29234    
 3    SS39234    
 4    SS3456     
 5    SS23456    
 6    SS789      
 7    SS956

See running example at db<>fiddle.

CodePudding user response：

Maybe you could do it without updating!?

I would probably try to :

    CREATE NEW_TABLE AS 
    SELECT [do the "update" here] FROM OLD_TABLE;

- add indexes on new table
- add constraints on new table
- add anything else you need on new table (foreign keys, grants...)

and then

DROP TABLE OLD_TABLE; 
--  and 
RENAME NEW_TABLE To OLD_TABLE;

SELECT with your sample data:

WITH
    tbl as
        (
            Select 1 "NUM", 9234 "STUDENT_ID" From Dual Union All
            Select 2 "NUM", 9234 "STUDENT_ID" From Dual Union All
            Select 3 "NUM", 9234 "STUDENT_ID" From Dual Union All
            Select 4 "NUM", 3456 "STUDENT_ID" From Dual Union All
            Select 5 "NUM", 3456 "STUDENT_ID" From Dual Union All
            Select 6 "NUM", 789  "STUDENT_ID" From Dual Union All
            Select 7 "NUM", 956  "STUDENT_ID" From Dual  
        )
Select
    NUM, 
    CASE WHEN Count(NUM) Over(Partition By STUDENT_ID) = 1 THEN 'SS' || STUDENT_ID 
    ELSE 'SS' || Replace(Sum(1) Over(Partition By STUDENT_ID Order By NUM) - 1, 0, '') || STUDENT_ID
  END "STUDENT_ID"
From    
    tbl
Order By NUM

Result:

NUM	STUDENT_ID
1	SS9234
2	SS19234
3	SS29234
4	SS3456
5	SS13456
6	SS789
7	SS956