The naming of SAS library has 3 rules:
- no more than 8 character;
- may consist with underscore, numbers and English letters;
- start with underscore or English letters;
Here comes my question: How to validate a string include invalid library name or not using perl regular expression?
The string is consist with words, which are separated by one space, like the following:
sasuser work sashelp
sasuser work 7z sashelp
sasuser work dictionary
7z
and dictionary
not statisfy the rules, so I want an output, with 0, 1, 1 corresponding with the three input strings.
I have trying this in SAS, but it doesn't work:
data test;
input string&$42.;
x=prxmatch('/\b(?=\S )(?![A-Za-z_][A-Za-z0-9_]{0,7})\b/',string);
put x=;
cards;
sasuser work sashelp
sasuser work 7z sashelp
sasuser work dictionary
;
run;
Thanks for any hint.
Edit:2022-11-11
I am really looking for a regex way, you may use SAS language or not. I have a thought as following:
- Judge if the string contains a word or not;
- The word mismatch a regular expression;
- The regular expression discribe the rules of SAS library naming;
Is that possible?
CodePudding user response:
You appear to be testing if ANY of the words in the list are valid librefs. Instead test each word in the string separately.
Note that SAS already has a function, NVALID(), to test if a name is valid, but you need to add an additional test to make sure the length is not too long to use as a libref or fileref.
data test;
input string $80. ;
do index=1 to countw(string,' ');
word = scan(string,index,' ');
nvalid=nvalid(word,'v7') and lengthn(word) in (1:8);
x=prxmatch('\b[A-Za-z_][A-Za-z0-9_]{0,7}\b/',word);
output;
end;
cards;
sasuser work sashelp
sasuser work 7z sashelp
sasuser work dictionary
;
Result
Obs string index word nvalid x
1 sasuser work sashelp 1 sasuser 1 1
2 sasuser work sashelp 2 work 1 1
3 sasuser work sashelp 3 sashelp 1 1
4 sasuser work 7z sashelp 1 sasuser 1 1
5 sasuser work 7z sashelp 2 work 1 1
6 sasuser work 7z sashelp 3 7z 0 0
7 sasuser work 7z sashelp 4 sashelp 1 1
8 sasuser work dictionary 1 sasuser 1 1
9 sasuser work dictionary 2 work 1 1
10 sasuser work dictionary 3 dictionary 0 0
CodePudding user response:
If you want to do this without perl regex then here is a solution:
First let's get some more sample data:
data test;
input string&$42.;
cards;
sasuser work sashelp
sas_user _work 7z sashelp
sasuser work77 dictionary
;
run;
Here, the resulting column "valid" consists of a list of flags (1 for valid, 0 for invalid):
data validation (drop=i txt);
set test;
length valid $12;
do i=1 to countw(string);
txt=scan(string,i,' ');
if txt ne '' then do;
if (length(txt) gt 8
or substr(txt,1,1) eq compress(substr(txt,1,1),'_' , 'a')
or txt ne compress(txt, ,'kan')
)
then valid=catx(', ',valid,'0');
else valid=catx(', ',valid,'1');
end;
end;
run;
Result:
string valid
-----------------------------------------
sasuser work sashelp 1, 1, 0
sas_user _work 7z sashelp 1, 1, 0, 1
sasuser work77 dictionary 1, 1, 0