Home > Mobile >  Transforming a a string ID into numeric and then joining into a large dataset without consuming all
Transforming a a string ID into numeric and then joining into a large dataset without consuming all

Time:06-30

I'm transforming a set of distinct id_product (string with letters and numbers) into a numeric id (its own row number in this case) and then joining this new numeric column into a large dataset with multiple IDs, as it follows:

with cte as (select distinct id_product,
                    row_number() over () as id_product2
             from tb_market_data
select t1.id_customer,
       t1.id_product,
       t2.id_product2
from tb_market_data as t1
left join cte as t2 on t1.id_product = t2.id_product

Although this method works, since I have a large dataset, joining those two tables using a string as a key is consuming all memory. There's a way to make the processing less memory hunger?

PS: I can't simple remove all the letters from id_product because the lack off then would result into two different products being the same (e.g. X001 and B001 would be equally 001)

CodePudding user response:

Since the logic to assign the new ID is not deterministic you have freedom to assign it in any way.

You can use RANK() or DENSE_RANK() for this purpose and avoid the join.

For example:

select
  id_customer,
  id_product,
  dense_rank() over(order by id_product) as id_product2
from tb_market_data

In terms of performance, this can made faster if there was an index by id_product and if the query sorted by this column.

  • Related