I have a sample program that uses two different Java regex libraries, namely generex & xeger.
My sample program generates a string from a regex pattern. The pattern I use in my program is
^([0-9]{5,6}-)?[^-] $
The following is my sample program.
import com.mifmif.common.regex.Generex;
import nl.flotsam.xeger.Xeger;
public class PatternGenerator {
public static void main(String[] args) {
Xeger x = new Xeger("^([0-9]{5,6}-)?[^-] $");
for (int i = 0; i < 3; i ) {
System.out.println("Xeger: " x.generate());
}
Generex g = new Generex("^([0-9]{5,6}-)?[^-] $");
for (int i = 0; i < 3; i ) {
System.out.println("Generex:" g.random());
}
}
}
I receive the following output.
Xeger: ^"믟ꍥ잲$'涢$$
Xeger: ^츣$()'氷,%*$,䷝(궞ᴸ $娐⮁$$ ")%予&,$
Xeger: ^4# 妡,䯒 醁꣡(킒)($
Generex:^㬹)$댮$ $(((ⷠ(玖㐳 它$$$
Generex:^蝙$
Generex:^3/ⸯ꫰$$$(&$
Unfortunately, the output is not readable. If I provide the regex to some online generators, I get different output. For example, if I use https://www.browserling.com/tools/text-from-regex
, I get the following output.
LUPK*WqG)e8Od_LYtKq;Wp:N &sy>]sGSt[&sj>r|6HQBr)|W<IDy'CeY
96817-ie;Y~Mb@673#Y2e:vlGXDz5\AjyLE4hdqpu;^sqY7ziyYCF,,A5]}n;@4.\4\~`
590766-yAVPh1,fe&>uc*WA2s,
T1'K.skX~[e#$dK'SubJ
06278->THw_YTnH`n"?Jf1n}"v<<xy1SCeQ/WF%G(tZ(VD_J,t1YrQ,TZ@{k
In my maven pom.xml I am using the generex and xeger dependencies.
<dependency>
<groupId>com.github.mifmif</groupId>
<artifactId>generex</artifactId>
<version>1.0.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.github.krraghavan/xeger -->
<dependency>
<groupId>com.github.krraghavan</groupId>
<artifactId>xeger</artifactId>
<version>1.0.0-RELEASE</version>
</dependency>
Why is the output of my program unreadable?
CodePudding user response:
If we take your expression ^([0-9]{5,6}-)?[^-] $
apart
^
- beginning of line - OK
([0-9]{5,6}-)?
optional block of 5 or 6 digits followed be a hyphen: this is being used by the generator
[^-]
any character except hypen: this allows any of literally thousands of characters so the proportion that are part of the few hundred that are readable being included is relatively small. If you look there are some 'readable' characters.
$
- end of line - OK
You might wish to modify your regex to
^([0-9]{5,6}-)?(?!-([\w\s])) $