As per the swift documentation, The SWIFT X character set consists of the below
X Character Set – SWIFT Character Set
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
/ – ? : ( ) . , ‘ CrLf Space
I have come up with the below to validate the swift character set which seems to be working but want to know if there is a better way of doing it.Also what should I use for CRLF to be OS neutral.Since i use unix I have put chr(10)
^[a-zA-Z0-9 -?:().,'' chr(10)/]*$
CodePudding user response:
Unfortunately, a range like a-z
may include accented letters and collation elements, depending on the value of nls_sort
at the time of running a query. And, alas, Oracle does not support the character class [[:ascii:]]
, which would be another way to specify what you need.
You have two choices. Either you specify the nls_sort
parameter explicitly every time, before running the query (or rely on it being something like 'English'
already), which to me doesn't sound like a good practice; or you specify all letters explicitly.
There are a few more things to fix. The dash -
has special meaning in a bracketed expression; if you want it to mean a literal dash, it should appear as either the first or the last character in the list, where it can't have its special meaning. All other regexp special characters are not special in a bracketed expression, so you don't need to worry about dot, question mark, asterisk, parentheses, etc.
However, note that the single-quote character, while it has no special meaning in a regular expression (in a bracketed expression or otherwise), it does have a special meaning in a string in Oracle; to include a single-quote in a hard-coded string, you must escape it by typing two single-quote characters.
Then - if you write chr(10)
in a bracketed expression, that is characters c
, h
, ... - if you mean LF, you need to either actually include a newline character in your string (probably a bad idea), or concatenate it by hand.
And if you want to validate against the official character set of "swift x" (whatever that is), you should include all characters, regardless of your OS. So you should accept CR (chr(13)) too, unless you have a better reason to omit it. If it is present but you don't want it in your db, you should accept it and then remove it after the fact and save the resulting string (after you remove CR), not reject the entire string altogether.
To keep the work organized, I would create a very small table (or view) to store the needed validation string, then use it in all queries that need it.
Something like this:
create table swift_validation (validation_pattern varchar2(100));
insert into swift_validation (validation_pattern)
with helper(ascii_letters) as (
select 'abcdefghijklmnopqrstuvwxyz' from dual
)
select '^[' || ascii_letters -- a-z (ASCII)
|| upper(ascii_letters) -- A-Z (ASCII)
|| '0-9'
|| chr(10) || chr(13) -- LF and CR
|| '/?:().,'' -'
|| ']*$'
from helper
;
commit;
Check what was saved in the table:
select * from swift_validation;
VALIDATION_PATTERN
------------------------------------------------------------
^[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0-9
/?:().,' -]*$
Note that the result is in three lines. chr(10)
is seen as a newline; then chr(13)
by itself is converted to another newline.
In any case, if you really want to see the exact characters saved in this string, you can use the dump
function. With option 17
for the second argument, the output is readable (you will have to scroll though):
select dump(validation_pattern, 17) from swift_validation;
DUMP(VALIDATION_PATTERN,17)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Typ=1 Len=73: ^,[,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,0,-,9,^J,^M,/,?,:,(,),.,,,', , ,-,],*,$
Notice in particular the control characters, ^J
and ^M
; they mean chr(10)
and chr(13)
respectively (easy to remember: J and M are the tenth and thirteenth letters of the Latin alphabet).
Then you use this as follows:
with
test_strings (str) as (
select 'abc (12)' from dual union all
select '$122.38' from dual union all
select null from dual union all
select 'café au lait' from dual union all
select 'A / B - C * D' from dual
)
select t.str,
case when regexp_like(t.str, sv.validation_pattern)
then 'valid' else 'invalid' end as swift_valid
from test_strings t, swift_validation sv
;
STR SWIFT_VALID
------------- -----------
abc (12) valid
$122.38 invalid
invalid
café au lait invalid
A / B - C * D invalid
Notice one last oddity here. In my test, I included a row where the input string is empty (null
). Regular expressions are odd in this respect: null
is not (regexp_) like something like 'a*'
- even though *
is supposed to mean "zero or more ...". Oracle's reasoning, perhaps, is that null
may be anything - just one of the hundreds of ways the Oracle identification of null
and "empty string" is just plain idiotic. It is what it is though; make sure you don't reject a row with an empty string. I assume "swift x" allows empty strings. You will need to handle that separately, like this:
with
test_strings (str) as (
select 'abc (12)' from dual union all
select '$122.38' from dual union all
select null from dual union all
select 'café au lait' from dual union all
select 'A / B - C * D' from dual
)
select t.str,
case when t.str is null
or regexp_like(t.str, sv.validation_pattern)
then 'valid' else 'invalid' end as swift_valid
from test_strings t, swift_validation sv
;
STR SWIFT_VALID
------------- -----------
abc (12) valid
$122.38 invalid
valid
café au lait invalid
A / B - C * D invalid
Left as exercise:
You may need to find the invalid characters in an invalid string. For such generalized applications (more than a straight validation of a whole string), you might be better off saving just the bracketed expression in the swift_validation
table (without the leading anchor ^
, and the trailing quantifier *
and anchor $
). Then you need to re-write the validation query slightly, to concatenate these fragments to the validation pattern in the regexp_like
condition; but then you can include, for example, an additional column to show the first invalid character in an invalid string.
EDIT
In follow-up discussion (see comments below this answer), the OP clarified that only the combination chr(13) || chr(10)
(in that order) is permitted. chr(10)
and chr(13)
are invalid if they appear by themselves, or in the wrong order.
This makes the problem more interesting (more complicated). To allow only the letters a, b, c or the sequence xy (that is: x alone, or y alone, are not allowed; every x must appear followed immediately by y, and every y must appear immediately preceded by x), the proper matching pattern looks like
'^([abc]|xy)*$'
Here expr1|expr2
is alternation, and it needs to be enclosed in parentheses to apply the *
quantifier.
An additional complication is that $
doesn't actually match "the end of the input string"; it anchors either at the end of the input string, or if the input string ends in newline (chr(10)
), it anchors before that character. Happily, there is the alternative anchor \z
that doesn't suffer from that defect; it anchors truly at the end of the input string. This will be needed if we don't want to validate input strings that end in chr(10)
not preceded immediately by chr(13)
. (If we do want to allow those - even though technically they do violate the "swift x" rules - then replace \z
with $
as we had it before).
Here I demonstrate a slightly modified approach - now the small table that stores the validation rule only contains the alternation bit - either one character out of an enumeration, or the two-character sequence chr(13) || chr(10)
), letting the "caller" wrap this within whatever is needed for a complete matching pattern.
The small table (note that I changed the column name):
drop table swift_validation purge;
create table swift_validation (valid_patterns varchar2(100));
insert into swift_validation (valid_patterns)
with helper(ascii_letters) as (
select 'abcdefghijklmnopqrstuvwxyz' from dual
)
select '[' -- open bracketed expression
|| ascii_letters -- a-z (ASCII)
|| upper(ascii_letters) -- A-Z (ASCII)
|| '0-9'
|| '/?:().,'' -' -- '' escape for ', - last
|| ']' -- close bracketed expression
|| '|' -- alternation
|| chr(13) || chr(10) -- CR LF
from helper
;
commit;
Testing (notice the modified match pattern: now the ^
and \z
anchors, the parentheses and the *
quantifier are hard-coded in the query, not in the saved string):
with
test_strings (id, str) as (
select 1, 'abc (12)' from dual union all
select 2, '$122.38' from dual union all
select 3, null from dual union all
select 4, 'no_underline' from dual union all
select 5, 'A / B - C * D' from dual union all
select 6, 'abc' || chr(10) || chr(13) from dual union all
select 7, 'abc' || chr(10) from dual union all
select 8, 'abc' || chr(13) || chr(10) from dual union all
select 9, 'café au lait' from dual
)
select t.id, t.str,
case when t.str is null
or regexp_like(t.str, '^(' || sv.valid_patterns || ')*\z')
then 'valid' else 'invalid' end as swift_valid
from test_strings t, swift_validation sv
;
ID STR SWIFT_VALID
-- ------------- -----------
1 abc (12) valid
2 $122.38 invalid
3 valid
4 no_underline invalid
5 A / B - C * D invalid
6 abc invalid
7 abc invalid
8 abc valid
9 café au lait invalid
The newline characters (CR and LF) aren't clearly visible in the output; I added an id
column so you can reference the output according to the input in the with
clause.