I am trying to create two regular expressions to capture the needed characters of European license plates.
It's important to mention that
- the delimiter separating the country from the rest (first letter) is always an
* (asterisk) or a - (hyphen)
- the delimiter separating the district from the other characters on the right is always
a - (hyphen)
- The district can also contain letters such as ä,ö,ü
The license plates look like this:
A*S-XXXPA
A*SL-XXXPC
A*SL-XXXSD
A*HA-XXXHV
D*R-XXXXX
D*TS-XXXXX
A*VB-1XXXXX
The RegExs I use for capturing the countries and the district are the following.
String country = "^([A-Z]{1,3})";
String district = "\\h*(\\p{L}{1,3})[-*]";
Once I get the needed information out of my Strings, I remove the information I don't need with java code, here's the code piece:
if (matcher.find()) {
country_region = matcher.group(1);
country_region = country_region.replace("*", "");
country_region = country_region.replace("-", "");
country_region = country_region.replaceAll("\\s $", "");
}
My regex capturing the countries works fine, here's an example:
The one I'm having troubles with is the RegEx I use to capture districts. At the moment it also catches the countries...
I guess I could just remove the asterisk at the end of my RegEx, but I do not think it's the cleanest way to do it.
Thank you!
CodePudding user response:
The main issue with the district regex is that \h*
matches any zero or more horizontal whitespaces. So the match can also occur at the start of string.
Since you want to get a match after a horizontal whitespace, *
or -
, you can use
[*\h-](\p{L}{1,3})[-*]
See the regex demo. Here, [*\h-]
matches a *
, a horizontal whitespace or a -
char.
However, it makes sense to use a regex to match the stirng while capturing all parts into groups:
^([A-Z]{1,3})[\h*-](\p{L}{1,3})[-*](. )
See this regex demo. Details:
^
- start of string([A-Z]{1,3})
- Group 1: one, two or three uppercase letters[\h*-]
- a horizontal whitespace,*
or-
(\p{L}{1,3})
- Group 2: one to three any Unicode letters[-*]
- a-
or*
char(. )
- Group 3: all text till the end of string/line.