I am trying to create two regular expressions to capture the needed characters of a European license plate.
I'm using two RegEx at the moment
- Capture the country
(^[A-Z]{1,3}[\\s])
- Capture the district
(([^\\s*-A-Z]{1,3}[*])|([^\\s*-A-Z]{1,3}[-])
)
Here are examples of licenseplates formats I have:
D HG-ABCDE : Country should be D, District should be HG
A S-FGHIJ : Country should be A, District should be S
D AC-KLMNO : Country should be D, District should be AC
A BR-PQRST : Country should be A, District should be BR
A RO*UVWXY : Country should be A, District should be RO
Once I get the needed information out of my Strings, I remove the information I don't need with java code, here's the code piece:
if (matcher.find()) {
country_region = matcher.group(1);
country_region = country_region.replace("*", "");
country_region = country_region.replace("-", "");
country_region = country_region.replaceAll("\\s $", "");
}
Now that I have explained my topic, here's the problem I am having. My District RegEx does not work as intended, it selects the wrong letters and that results in a wrong mapping from my side afterwards. I unfortunately could not find my error and here I am asking for help!
How may I rewrite my District RegEx to retrieve the letters after the space separating the country from the district but before the delimiters * or - ?
Thank you very much!
CodePudding user response:
The character class [^\\s*-A-Z]
is a negated character class which matches any char except the listed. This part \\s*-A
denotes a range from ASCII decimal number 42-65, and will also not match the A
char.
If you would change it into [\s*A-Z-]{1,3}
it would still match a space and a hyphen matching too much.
You could use 2 capture groups where the Country is in group 1 and the District is in group 2.
To match either -
or *
you can use a character class [-*]
^([A-Z]{1,3})\h ([A-Z]{1,3})[-*][A-Z]{5}$
The pattern matches:
^
Start of string([A-Z]{1,3})
Capture 1-3 occurrences of A-Z in group 1\h
Match 1 occurrences of a horizontal whitespace char([A-Z]{1,3})
Capture 1-3 occurrences of A-Z in group 2[-*][A-Z]{5}
Match either-
or*
and 5 occurrences of A-Z$
End of string
In Java
String regex = "^([A-Z]{1,3})\\h ([A-Z]{1,3})[-*][A-Z]{5}$";
See a regex demo
Only capturing the first 2 groups without the chars at the end and starting with a word boundary:
\b([A-Z]{1,3})\h ([A-Z]{1,3})[-*]