How can I pull the records from the first table (Table_A) which has an Attr column having embedded CODE values those contain at least one uncommon CODE value whenever compared to the second table (Table_B: CODE & DESCR cols.) so that I get table_C as the result? This is what I have so far:
Select * From Table_A
Where Attr like '%CODE%' AND
not exist (select * from Table_B
where Table_A.Attr LIKE '%'||Table_B.CODE||'%')
table A
ID | Attr |
---|---|
1 | CODE = A111 |
2 | CODE = 'A111, B222, C333, D444' |
3 | CODE = 'D444', 'E555', 'F666' |
4 | CODE = 'G777', 'B222' |
5 | ITEM = 'AFRD' AND CODE = 'C333' |
6 | ITEM = BYNM |
table_B
CODE | DESCR |
---|---|
A111 | djiefljfe |
D444 | qrrascjg |
E555 | wpofler |
F666 | nfosmwfa |
G777 | losk |
table_C
ID | Attr |
---|---|
2 | CODE = 'A111, B222, C333, D444' |
4 | CODE = 'G777', 'B222' |
5 | ITEM = 'AFRD' AND CODE = 'C333' |
CodePudding user response:
You must give Table_A
an alias, so that you can reference it in the nested select:
SELECT *
FROM
Table_A A
WHERE
Attr LIKE '%CODE%' AND
NOT EXISTS
(
SELECT *
FROM Table_B
WHERE A.Attr LIKE '%'||Table_B.CODE||'%'
)
Also, the keyword is EXISTS
not EXIST
.
SQL is not able to parse complex arbitrary expressions like ITEM = 'AFRD' AND CODE = 'C333'
and to differentiate between C333
being a code and AFRD
being an item. However, if item ids are always different from codes this is not a problem.
See: http://sqlfiddle.com/#!4/995f23/1/0
CodePudding user response:
As long as codes always follow items, you could extract the list of codes with:
regexp_substr(attr, '.*(CODE = ?)(.*?)', 1, 1, null, 2
and remove the spaces and quotes with
translate(regexp_substr(attr, '.*(CODE = ?)(.*?)', 1, 1, null, 2), q'^x' ^', 'x')
to get a simple comma-separated list of values to look for, like G777,B222
; and then split that into individual elements, for example using recursive subquery factoring; and find any that don't exist in the other table.
Which is messy, but:
with cte (id, attr, codes) as (
select id, attr,
translate(
regexp_substr(attr, '.*(CODE = ?)(.*?)', 1, 1, null, 2),
q'^x' ^', 'x'
)
from table_a
),
rcte (id, attr, codes, pos, code) as (
select id, attr, codes, 1, regexp_substr(codes, '(.*?)(,|$)', 1, 1, null, 1)
from cte
where codes is not null
union all
select id, attr, codes, pos 1, regexp_substr(codes, '(.*?)(,|$)', 1, pos 1, null, 1)
from rcte
where regexp_substr(codes, '(.*?)(,|$)', 1, pos 1, null, 1) is not null
)
select distinct r.id, r.attr
from rcte r
where not exists (
select null
from table_b b
where b.code = r.code
)
gives
ID | ATTR |
---|---|
2 | CODE = 'A111, B222, C333, D444' |
4 | CODE = 'G777', 'B222' |
5 | ITEM = 'AFRD' AND CODE = 'C333' |
The cte
gets the simple CSV values; rcte
splits those up to individual components (and assumes there will be no empty elements), and then those individual values are checked with not exists
. Which gives duplicate id/attr pairs, so distinct
removes the duplicates; but you can also see which code(s) didn't exist if you want, by not applying distinct, and potentially using listagg
to still get a single result per ID.
It's painful, but that's what happens with data stored like this...
CodePudding user response:
You can use an outer join among Table_A and Table_B in order to obtain unmatched codes after converting comma separated codes to the newly generated rows per each individual code through use of regular expression functions along with a hierarchical query such as
WITH a0 AS
(
SELECT id, CASE WHEN INSTR(attr, 'CODE')>0 THEN
REGEXP_REPLACE(attr,'(.*CODE =) ')
END AS codes
FROM Table_A
), a AS
(
SELECT id, TRIM(BOTH CHR(39) FROM TRIM(REGEXP_SUBSTR(codes,'[^,] ',1,level)) ) AS codes
FROM a0
WHERE codes IS NOT NULL
CONNECT BY LEVEL <= REGEXP_COUNT(codes, ',') 1
AND PRIOR SYS_GUID() IS NOT NULL
AND PRIOR id = id
)
SELECT a.*
FROM Table_A a
JOIN (SELECT DISTINCT id
FROM a
LEFT JOIN Table_B
ON code = codes
WHERE code IS NULL) b
ON a.id = b.id