Oracle SQL: Using WHERE LIKE but for specific whole words / REGEXP

I have a table of invoices with cost identifiers saved in a specific way, non always standard, like below:

ID | SYMBOL | COST_IDS
--- -------- -------------------
 1 | FV01   | '1076219, 1081419'
 2 | FV02   | '107621,123421'
 3 | FV03   | '111521; 107621'

I would like to find invoices for a specific cost identifier.

The structure of the cost is (4 or more digits) (2 year digits)

In Test case: 107621, the desired output would be ID: 2 and 3.

SELECT * FROM INVOICES WHERE COST_IDS like '7621%' Is a wrong approach.

I found that I need to use REGEXP_LIKE and I am struggling with it. I know I need to find the whole world exactly, but not necessarily on the beginning.

Can anyone help me?

Edit: this seems to work in most cases, but fails with the end of the string:

SELECT * 
FROM INVOICES 
WHERE REGEXP_LIKE(COST_IDS, '[^|\s|,|;]107621[$|\s|,|;]')

Also is there a way to mark 'non-digit' character instead of writing specifically \s|,|;?

Why is $ not working?

CodePudding user response：

You can use \D to match a non-digit:

SELECT *
FROM   INVOICES
WHERE  REGEXP_LIKE(COST_IDS, '(^|\D)107621(\D|$)')

Which, for the sample data:

CREATE TABLE invoices (ID, SYMBOL, COST_IDS) AS
SELECT 1, 'FV01', '1076219, 1081419' FROM DUAL UNION ALL
SELECT 2, 'FV02', '107621,123421' FROM DUAL UNION ALL
SELECT 3, 'FV03', '111521; 107621' FROM DUAL;

Outputs:

ID SYMBOL COST_IDS

2 FV02 107621,123421

3 FV03 111521; 107621

ID	SYMBOL	COST_IDS
2	FV02	107621,123421
3	FV03	111521; 107621

db<>fiddle here

Your regular expression does not work as:

[^|\s|,|;] matches a single character that is not either | or \ or s or | or , or | or ;
[$|\s|,|;] matches a single character that is either $ or | or \ or s or | or , or | or ;

If you want to match either the start-of-the-string or a white space character or a comma or a semi-colon then you want the regular expression (^|\s|[,;]).

Similarly, if you want to match end-of-the-string or a white space character or a comma or a semi-colon then you want the regular expression ($|\s|[,;]).

CodePudding user response：

You can use the following SQL Select statement considering you have two seperated portions for cost_ids, or presuming only one type of seperator for each cost_ids values even if there are more portions

WITH inv AS
(
 SELECT id, REGEXP_SUBSTR(cost_ids,'[[:punct:]] ') AS symbol2, cost_ids
   FROM invoices
)
 SELECT DISTINCT id
   FROM inv   
  WHERE REGEXP_SUBSTR(cost_ids,'[^'''||symbol2||''' ] ',1,level) = '107621' 
CONNECT BY level <= REGEXP_COUNT(cost_ids,symbol2)   1 
    AND PRIOR SYS_GUID() IS NOT NULL
    AND PRIOR id = id

where the seperator symbol determined initially with the name symbol2 through use of [[:punct:]] posix which is used to extract the punctuation characters.

Demo