Home > other >  Adding a sequential value to a column which has duplicate values
Adding a sequential value to a column which has duplicate values

Time:11-18

I have a table in Postgres with a column that has distinct alphanumeric values in the pattern 1234P001. However, due to some bug, there are duplicate values in the column, like 1234P001 appearing thrice.

I want to replace duplicate 1234P001's with 1234P002, 1234P003 and 1234P004. How can I do this in PostgresSql?

I tried using sequence but it didn't work.

CodePudding user response:

This can be done with a temporary table and the use of row_number window function. Here is an illustration.

-- Prepare a test case
create table the_table (id integer, the_column text);
insert into the_table values 
(1, '1234P001'), 
(2, '1235P001'), 
(3, '1234P001'), 
(4, '1236P001'), 
(5, '1235P001'), 
(6, '1234P001');


create temporary table the_temp_table as 
 select *, row_number() over (partition by the_column order by id) ord 
 from the_table ;

update the_temp_table
 set the_column = the_column||'.'||ord::text where ord > 1;

truncate table the_table;

insert into the_table(id, the_column)
 select id, the_column from the_temp_table;

select * from the_table order by the_column;
id the_column
1 1234P001
3 1234P001.2
6 1234P001.3
2 1235P001
5 1235P001.2
4 1236P001

CodePudding user response:

Using this sample data to illustrate the concept

create table tab (id varchar(8) );
insert into tab(id) values 
('1234P001'), 
('1234P001'), 
('1234P001'), 
('1234P002'), 
('1234P004'), 
('1234P004'),
('1234P005');

First you need to identify the duplicated key - use count .. over

select id,
count(*) over (partition by id) > 1  is_dup
from tab;

id      |is_dup|
-------- ------ 
1234P001|true  |
1234P001|true  |
1234P001|true  |
1234P002|false |
1234P004|true  |
1234P004|true  |
1234P005|false |

Assign each duplicated row a unique sequence number (you'll see soon why)

with dup as (
select id,
count(*) over (partition by id) > 1  is_dup
from tab
)
select id,  
row_number() over (order by id) dup_idx
from dup
where is_dup;

id      |dup_idx|
-------- ------- 
1234P001|      1|
1234P001|      2|
1234P001|      3|
1234P004|      4|
1234P004|      5|

Now generate all not existing keys based on you key schema (here prefix of length 5 and 3 digit integer)

with free_key as (
select distinct substring(id,1,5)||lpad(idx::text,3,'0') id 
from tab
cross join generate_series(1,10) as t(idx) /* increase the count up to 999 if required */
except 
select id from tab)
select id,
row_number() over (order by id) free_id_idx
from free_key

id      |free_id_idx|
-------- ----------- 
1234P003|          1|
1234P006|          2|
1234P007|          3|
1234P008|          4|
1234P009|          5|
1234P010|          6|

In the last step simple join the table with duplicated keys with the unassigned key using teh unique index to get the resolution old_id and the unique new_id

Note I use an outer join - if you get an empty new_id there is a problem you have no free key to fix in your schema.

with dup as (
select id,
count(*) over (partition by id) > 1  is_dup
from tab
),
dup2 as (
select id,  
row_number() over (order by id) dup_idx
from dup
where is_dup),
free_key as (
select distinct substring(id,1,5)||lpad(idx::text,3,'0') id 
from tab
cross join generate_series(1,10) as t(idx) /* increase the count up to 999 if required */
except 
select id from tab),
free_key2 as (
select id,
row_number() over (order by id) free_id_idx
from free_key)
select dup2.id old_id, free_key2.id new_id
from dup2
left outer join free_key2
on dup2.dup_idx = free_key2.free_id_idx;

old_id  |new_id  |
-------- -------- 
1234P001|1234P003|
1234P001|1234P006|
1234P001|1234P007|
1234P004|1234P008|
1234P004|1234P009|
  • Related