I have some records like below:
urutan | desc |
---|---|
1432 | AMAN032 - Gunting |
1433 | BENANG156 - Sikat Pintu |
1434 | Oli Bell One AT-D 20W40 |
1435 | Water Refill |
1436 | KABUR001 - Gosok Air |
1437 | Kabel Ties 20 Cm - 50 |
Based on the table above, i have two formats of string in field desc:
- the correct format, which are urutan 1432, 1433, 1436.
- the incorrect format, which are urutan 1434, 1435, 1437
as you can see, the correct format has format like this below:
AMAN032 - Gunting
then i split:
[AMAN][032] - [Gunting]
[a word][3 digits number][space][-][space][any words]
.
my goal is I want to SELECT all records which does not match the correct format using PosgreSQL REGEXP in WHERE Clause (in a condition I don't know which urutan). so the result from table above are urutan 1434, 1435, 1437.
CodePudding user response:
Try with the following query:
SELECT *
FROM tab
WHERE NOT desc_ ~ '^[A-Za-z] [0-9]{3} - [A-zA-Z ] $'
Regex Explanation:
^
: start of string[A-Za-z]
: any alphabetical character[0-9]{3}
: three digits-
: space dash space[A-zA-Z ]
: any combination of alphabetical character and space$
: end of string
Try it here.
Note: The hole in this regex may be in the last combination of alphabetical characters. You can either play with it at the provided link to include or exclude characters that you want/don't want to match for that part, or share further details on how that part can be composed of, and I could try improving the pattern matching.