I'm working on the ETL for a warehouse management system and I've come across a warehouse location map that's all over the place in terms of naming conventions so my usual method of parsing the warehouse location into warehouse location and warehouse location level isn't working. I think REGEXP can be used here, but I don't know it that well.
By looking at the last 5 characters, you can tell if it needs to be parsed. Here's the logic
*If First 3 of the last 5 are numbers and Last 2 are letters. Return anything left of last 2 as "ETL Location". Return last 2 as "Level ID".
If First 2 of the last 5 are numbers and 3rd character of last 5 is letter and Last 2 charcters are numbers. Return anything left of last 3 as "ETL Location". Return last 3 as "Level ID"
If first 4 of last 5 are numbers and last character is letter. Return anything left of last character as "ETL Location". Return last 1 as "Level ID"*
This is the desired result.
Given Location | ETL Location | Level |
---|---|---|
A103AB | A103 | AB |
A103A01 | A103 | A01 |
A103A02 | A103 | A02 |
A103456 | A103456 | null |
A103A | A103 | A |
A103B | A103 | B |
ABCDEFG | ABCDEFG | null |
Can Regexp be used here?
I've done the following, but I'm still getting incorrect results since there's a lot of different naming conventions. If I do it this way and the warehouse makes new locations, logic like this will probably get broken at some point even if it works currently so I'm thinking identifying what is a "Level" and going from there is a better approach here.
case
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '5'
THEN substr(loc_code,1,4)
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '7'
THEN substr(loc_code,1,6)
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '6'
THEN substr(loc_code,1,5)
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '8'
and loc_code like 'Z%'
THEN substr(loc_code,1,6)
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '8'
THEN substr(loc_code,1,5)
ELSE loc_code
END AS "Location",
case
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '5'
THEN substr(loc_code,5,1)
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '7'
THEN substr(loc_code,7,1)
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '6'
THEN substr(loc_code,6,1)
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '8'
and loc_code like 'Z%'
THEN substr(loc_code,7,2)
when loc_tp_code in ('RACK','PICK','RKL2','PLNW')
and length(loc_code) = '8'
THEN substr(loc_code,6,3)
ELSE null
END AS "Level ID"
CodePudding user response:
Something like 1 CASE for "ETL location" and 1 for "LEVEL":
case substr(location,-5)
when regexp_substr( substr(location,-5), '\d{3}[A-Z]{2}' ) then
substr(location, 1, length(location)-5) || substr(location, length(location)-4,3)
when regexp_substr( substr(location,-5), '\d{2}[A-Z]\d{2}' ) then
substr(location, 1, length(location)-5) || substr(location, length(location)-4,3)
when
...
end as etl_location,
case substr(location,-5)
when regexp_substr( substr(location,-5), '\d{3}[A-Z]{2}' ) then
substr(location, -2)
when regexp_substr( substr(location,-5), '\d{2}[A-Z]\d{2}' ) then
substr(location, -3)
when
...
end as lvl
CodePudding user response:
Based on your test data, I think you can simplify your rule. Here I describe the given location as a regular expression, grouping the components (with parenthesis), then returning them based on the rule of etl-location starts with a letter, then is followed by either all letters to the end OR one or more digits to the end OR one or more digits followed by a letter and whatever else until the end (made optional to hande the cases of NULL warehouse level). In the case of an etl_location, the first group is returned and for the level the 4th group is returned.
You have to study your data to see if this indeed matches your naming rules.
with tbl(data) as (
select 'A103AB' from dual union all
select 'A103A01' from dual union all
select 'A103A02' from dual union all
select 'A103456' from dual union all
select 'A103A' from dual union all
select 'A103B' from dual union all
select 'ABCDEFG' from dual
)
select data given_loc,
regexp_substr(data, '(([A-Z])([A-Z] $|\d $|\d ))([A-Z].*)?', 1, 1, null, 1) ETL_loc,
regexp_substr(data, '(([A-Z])([A-Z] $|\d $|\d ))([A-Z].*)?', 1, 1, null, 4) wh_level
from tbl;
GIVEN_LOC ETL_LOC WH_LEVEL
--------- ------- --------
A103AB A103 AB
A103A01 A103 A01
A103A02 A103 A02
A103456 A103456
A103A A103 A
A103B A103 B
ABCDEFG ABCDEFG
7 rows selected.