Transforming a a string ID into numeric and then joining into a large dataset without consuming all-CodePudding

I'm transforming a set of distinct id_product (string with letters and numbers) into a numeric id (its own row number in this case) and then joining this new numeric column into a large dataset with multiple IDs, as it follows:

with cte as (select distinct id_product,
                    row_number() over () as id_product2
             from tb_market_data
select t1.id_customer,
       t1.id_product,
       t2.id_product2
from tb_market_data as t1
left join cte as t2 on t1.id_product = t2.id_product

Although this method works, since I have a large dataset, joining those two tables using a string as a key is consuming all memory. There's a way to make the processing less memory hunger?

PS: I can't simple remove all the letters from id_product because the lack off then would result into two different products being the same (e.g. X001 and B001 would be equally 001)

CodePudding user response：

Since the logic to assign the new ID is not deterministic you have freedom to assign it in any way.

You can use RANK() or DENSE_RANK() for this purpose and avoid the join.

For example:

select
  id_customer,
  id_product,
  dense_rank() over(order by id_product) as id_product2
from tb_market_data

In terms of performance, this can made faster if there was an index by id_product and if the query sorted by this column.