I have a table of invoices with cost identifiers saved in a specific way, non always standard, like below:
ID | SYMBOL | COST_IDS
--- -------- -------------------
1 | FV01 | '1076219, 1081419'
2 | FV02 | '107621,123421'
3 | FV03 | '111521; 107621'
I would like to find invoices for a specific cost identifier.
The structure of the cost is (4 or more digits) (2 year digits)
In Test case: 107621
, the desired output would be ID: 2 and 3
.
SELECT * FROM INVOICES WHERE COST_IDS like '7621%'
Is a wrong approach.
I found that I need to use REGEXP_LIKE
and I am struggling with it. I know I need to find the whole world exactly, but not necessarily on the beginning.
Can anyone help me?
Edit: this seems to work in most cases, but fails with the end of the string:
SELECT *
FROM INVOICES
WHERE REGEXP_LIKE(COST_IDS, '[^|\s|,|;]107621[$|\s|,|;]')
Also is there a way to mark 'non-digit' character instead of writing specifically \s|,|;
?
Why is $
not working?
CodePudding user response:
You can use \D
to match a non-digit:
SELECT *
FROM INVOICES
WHERE REGEXP_LIKE(COST_IDS, '(^|\D)107621(\D|$)')
Which, for the sample data:
CREATE TABLE invoices (ID, SYMBOL, COST_IDS) AS
SELECT 1, 'FV01', '1076219, 1081419' FROM DUAL UNION ALL
SELECT 2, 'FV02', '107621,123421' FROM DUAL UNION ALL
SELECT 3, 'FV03', '111521; 107621' FROM DUAL;
Outputs:
ID SYMBOL COST_IDS 2 FV02 107621,123421 3 FV03 111521; 107621
db<>fiddle here
Your regular expression does not work as:
[^|\s|,|;]
matches a single character that is not either|
or\
ors
or|
or,
or|
or;
[$|\s|,|;]
matches a single character that is either$
or|
or\
ors
or|
or,
or|
or;
If you want to match either the start-of-the-string or a white space character or a comma or a semi-colon then you want the regular expression (^|\s|[,;])
.
Similarly, if you want to match end-of-the-string or a white space character or a comma or a semi-colon then you want the regular expression ($|\s|[,;])
.
CodePudding user response:
You can use the following SQL Select statement considering you have two seperated portions for cost_ids
, or presuming only one type of seperator for each cost_ids
values even if there are more portions
WITH inv AS
(
SELECT id, REGEXP_SUBSTR(cost_ids,'[[:punct:]] ') AS symbol2, cost_ids
FROM invoices
)
SELECT DISTINCT id
FROM inv
WHERE REGEXP_SUBSTR(cost_ids,'[^'''||symbol2||''' ] ',1,level) = '107621'
CONNECT BY level <= REGEXP_COUNT(cost_ids,symbol2) 1
AND PRIOR SYS_GUID() IS NOT NULL
AND PRIOR id = id
where the seperator symbol determined initially with the name symbol2
through use of [[:punct:]]
posix which is used to extract the punctuation characters.