Home > Software design >  Verify that a varchar follows a certain pattern
Verify that a varchar follows a certain pattern

Time:09-25

CREATE TABLE CODES(analysis_date Date, code Varchar(50));

INSERT INTO CODES VALUES('2021-01-01','PROD_OP21_115');
INSERT INTO CODES VALUES('2021-02-06','PROD_TR21_009R');
INSERT INTO CODES VALUES('2021-03-14','PROD_TR21_002');
INSERT INTO CODES VALUES('2021-05-01','PROD_OP21_050R');
INSERT INTO CODES VALUES('2020-06-04','PROD_OP20_450');

I have a column that stores a string, for example PROD_OP21_182. I am developing a query that returns all codes that are correctly entered (meet all conditions).

  • It always starts with PROD_.
  • After PROD_, there should be one of these acronyms: OP, UG or TR.
  • 21 means the last two digits of the year given by another column (analysis date).
  • A last _ and three numbers.
  • The code can end with an R (it is optional). Example: PROD_OP21_182R.
  • Uppercase always.

This is my attempt:

select CODE
from CODES
where SUBSTRING(CODE, 1, 4) = 'PROD'
and SUBSTRING(CODE, 5, 1) = '_'
and SUBSTRING(CODE, 6, 2) in ('OP','UG','TR')
and RIGHT(YEAR(analysis_date), 2) = SUBSTRING(CODE, 8, 2) 
and SUBSTRING(CODE, 10, 1) = '_'
and ( 
(len(CODE) = 13)
or 
(len(CODE) = 14) and CODE like '%R')

It works partially. I still need to check that the three characters in SUBSTRING(dispatch_number, 11, 3) are numbers, and that the rule of capital letters is fulfilled (simply, there cannot be lowercase in the code).

CodePudding user response:

You can use a case sensitive collation, for example

..
and SUBSTRING(CODE, 6, 2) COLLATE Latin1_General_CS_AS in ('OP','UG','TR')
..

and LIKE with a simple regex supported by SQL Server

.. 
and SUBSTRING(CODE, 11, 3) like '[0-9][0-9][0-9]' 
..

CodePudding user response:

First of all kudos on posting sample data and a table structure to make it easy for anybody to help. I wish everybody could do that. As I mentioned in my comment the table structure you have here is why this is so difficult. You are storing a bunch of information in a single tuple which violates 1NF. However we can't always fix that and have to query the mess we are given.

This is one way to solve this one. Instead of substring I am using parsename. It is a little more code but I find using substring really fiddly because when something changes in the logic you have to go fiddle with every instance of substring to get the new values right.

select *
from CODES
where parsename(replace(code, '_', '.'), 3) COLLATE Latin1_General_CS_AS = 'PROD'
    AND
    (
        left(parsename(replace(code, '_', '.'), 2), 2) in ('OP','UG','TR')
        OR
        try_parse(right(parsename(replace(code, '_', '.'), 2), 2) as int) = datepart(year, analysis_date)
    )
    AND
    (
        parsename(replace(code, '_', '.'), 1) like '[0-9][0-9][0-9]' 
        OR
        parsename(replace(code, '_', '.'), 1) COLLATE Latin1_General_CS_AS like '[0-9][0-9][0-9][R]'
    )
  • Related