PostgreSQL: How to use REGEXP in WHERE clause?-CodePudding

I have some records like below:

urutan	desc
1432	AMAN032 - Gunting
1433	BENANG156 - Sikat Pintu
1434	Oli Bell One AT-D 20W40
1435	Water Refill
1436	KABUR001 - Gosok Air
1437	Kabel Ties 20 Cm - 50

Based on the table above, i have two formats of string in field desc:

the correct format, which are urutan 1432, 1433, 1436.
the incorrect format, which are urutan 1434, 1435, 1437

as you can see, the correct format has format like this below:

AMAN032 - Gunting

then i split:

[AMAN][032] - [Gunting]

[a word][3 digits number][space][-][space][any words].

my goal is I want to SELECT all records which does not match the correct format using PosgreSQL REGEXP in WHERE Clause (in a condition I don't know which urutan). so the result from table above are urutan 1434, 1435, 1437.

CodePudding user response：

Try with the following query:

SELECT * 
FROM tab
WHERE NOT desc_ ~ '^[A-Za-z] [0-9]{3} - [A-zA-Z ] $'

Regex Explanation:

^: start of string
[A-Za-z] : any alphabetical character
[0-9]{3}: three digits
-: space dash space
[A-zA-Z ] : any combination of alphabetical character and space
$: end of string

Try it here.

Note: The hole in this regex may be in the last combination of alphabetical characters. You can either play with it at the provided link to include or exclude characters that you want/don't want to match for that part, or share further details on how that part can be composed of, and I could try improving the pattern matching.