I have a table in Postgres with a column that has distinct alphanumeric values in the pattern 1234P001. However, due to some bug, there are duplicate values in the column, like 1234P001
appearing thrice.
I want to replace duplicate 1234P001
's with 1234P002
, 1234P003
and 1234P004
. How can I do this in PostgresSql?
I tried using sequence but it didn't work.
CodePudding user response:
This can be done with a temporary table and the use of row_number
window function. Here is an illustration.
-- Prepare a test case
create table the_table (id integer, the_column text);
insert into the_table values
(1, '1234P001'),
(2, '1235P001'),
(3, '1234P001'),
(4, '1236P001'),
(5, '1235P001'),
(6, '1234P001');
create temporary table the_temp_table as
select *, row_number() over (partition by the_column order by id) ord
from the_table ;
update the_temp_table
set the_column = the_column||'.'||ord::text where ord > 1;
truncate table the_table;
insert into the_table(id, the_column)
select id, the_column from the_temp_table;
select * from the_table order by the_column;
id | the_column |
---|---|
1 | 1234P001 |
3 | 1234P001.2 |
6 | 1234P001.3 |
2 | 1235P001 |
5 | 1235P001.2 |
4 | 1236P001 |
CodePudding user response:
Using this sample data to illustrate the concept
create table tab (id varchar(8) );
insert into tab(id) values
('1234P001'),
('1234P001'),
('1234P001'),
('1234P002'),
('1234P004'),
('1234P004'),
('1234P005');
First you need to identify the duplicated key - use count .. over
select id,
count(*) over (partition by id) > 1 is_dup
from tab;
id |is_dup|
-------- ------
1234P001|true |
1234P001|true |
1234P001|true |
1234P002|false |
1234P004|true |
1234P004|true |
1234P005|false |
Assign each duplicated row a unique sequence number (you'll see soon why)
with dup as (
select id,
count(*) over (partition by id) > 1 is_dup
from tab
)
select id,
row_number() over (order by id) dup_idx
from dup
where is_dup;
id |dup_idx|
-------- -------
1234P001| 1|
1234P001| 2|
1234P001| 3|
1234P004| 4|
1234P004| 5|
Now generate all not existing keys based on you key schema (here prefix of length 5 and 3 digit integer)
with free_key as (
select distinct substring(id,1,5)||lpad(idx::text,3,'0') id
from tab
cross join generate_series(1,10) as t(idx) /* increase the count up to 999 if required */
except
select id from tab)
select id,
row_number() over (order by id) free_id_idx
from free_key
id |free_id_idx|
-------- -----------
1234P003| 1|
1234P006| 2|
1234P007| 3|
1234P008| 4|
1234P009| 5|
1234P010| 6|
In the last step simple join the table with duplicated keys with the unassigned key using teh unique index to get the resolution old_id
and the unique new_id
Note I use an outer join - if you get an empty new_id
there is a problem you have no free key to fix in your schema.
with dup as (
select id,
count(*) over (partition by id) > 1 is_dup
from tab
),
dup2 as (
select id,
row_number() over (order by id) dup_idx
from dup
where is_dup),
free_key as (
select distinct substring(id,1,5)||lpad(idx::text,3,'0') id
from tab
cross join generate_series(1,10) as t(idx) /* increase the count up to 999 if required */
except
select id from tab),
free_key2 as (
select id,
row_number() over (order by id) free_id_idx
from free_key)
select dup2.id old_id, free_key2.id new_id
from dup2
left outer join free_key2
on dup2.dup_idx = free_key2.free_id_idx;
old_id |new_id |
-------- --------
1234P001|1234P003|
1234P001|1234P006|
1234P001|1234P007|
1234P004|1234P008|
1234P004|1234P009|