I'm transforming a set of distinct id_product
(string with letters and numbers) into a numeric id (its own row number in this case) and then joining this new numeric column into a large dataset with multiple IDs, as it follows:
with cte as (select distinct id_product,
row_number() over () as id_product2
from tb_market_data
select t1.id_customer,
t1.id_product,
t2.id_product2
from tb_market_data as t1
left join cte as t2 on t1.id_product = t2.id_product
Although this method works, since I have a large dataset, joining those two tables using a string as a key is consuming all memory. There's a way to make the processing less memory hunger?
PS: I can't simple remove all the letters from id_product
because the lack off then would result into two different products being the same (e.g. X001 and B001 would be equally 001)
CodePudding user response:
Since the logic to assign the new ID is not deterministic you have freedom to assign it in any way.
You can use RANK()
or DENSE_RANK()
for this purpose and avoid the join.
For example:
select
id_customer,
id_product,
dense_rank() over(order by id_product) as id_product2
from tb_market_data
In terms of performance, this can made faster if there was an index by id_product
and if the query sorted by this column.