I am trying to extract the number present in a string in which the string comes in a different way. The String I receive and expected output is mentioned below.
PRODUCT_DESCRIPTION | EXPECTED PACK SIZE | CURRENT_RESULT |
---|---|---|
PRODUCT A 3 CHEESE SLICE 170 GM | 170 | 3170 |
PRODUCT B SUGAR 1.3KG (CL) | 1300 | 13 |
PRODUCT C CHEESE SLICES 12X156GM | 156 | 12156 |
PRODUCT KETCHUP BOTTLE 200GM (CL) | 200 | 200 |
PRODUCT KETCHUP 1.3KG (CL) | 1300 | 13 |
I tried using below transformation in Snowflake SQL, but it is only extracting all the numerical literals from the string.
REGEXP_REPLACE(SPLIT_PART(UPPER(PRODUTC_DESCRIPTION),'GM',1),'[^[:digit:]]')
I need to run the code in Snowflake, Any help would be appreciated.
Thanks
CodePudding user response:
Something like this might be helpful:
select PRODUTC_DESCRIPTION,
TRIM( regexp_substr( PRODUTC_DESCRIPTION, '([\\.\\d ]*\\d *)((KG|K|GM))',1,1,'e')) as packsize_num,
TRIM( regexp_substr( PRODUTC_DESCRIPTION, '[\\.\\d ]*\\d *((KG|K|GM))',1,1,'e')) as packsize_unit,
IFF( packsize_unit IN ('K','KG'), packsize_num * 1000, packsize_num ) packsize
from values
('PRODUCT A 3 CHEESE SLICE 170 GM'),
('PRODUCT B SUGAR 1.3KG (CL)'),
('PRODUCT C CHEESE SLICES 12X156GM'),
('PRODUCT KETCHUP BOTTLE 200GM (CL)'),
('PRODUCT KETCHUP 1.3KG (CL)') tmp(PRODUTC_DESCRIPTION);
----------------------------------- -------------- --------------- ----------
| PRODUTC_DESCRIPTION | PACKSIZE_NUM | PACKSIZE_UNIT | PACKSIZE |
----------------------------------- -------------- --------------- ----------
| PRODUCT A 3 CHEESE SLICE 170 GM | 170 | GM | 170 |
| PRODUCT B SUGAR 1.3KG (CL) | 1.3 | KG | 1,300 |
| PRODUCT C CHEESE SLICES 12X156GM | 156 | GM | 156 |
| PRODUCT KETCHUP BOTTLE 200GM (CL) | 200 | GM | 200 |
| PRODUCT KETCHUP 1.3KG (CL) | 1.3 | KG | 1,300 |
----------------------------------- -------------- --------------- ----------