Home > Net >  Ignore specific characters before a delimiter with RegEx
Ignore specific characters before a delimiter with RegEx

Time:12-21

I am trying to create two regular expressions to capture the needed characters of European license plates.

It's important to mention that

  • the delimiter separating the country from the rest (first letter) is always an * (asterisk) or a - (hyphen)
  • the delimiter separating the district from the other characters on the right is always a - (hyphen)
  • The district can also contain letters such as ä,ö,ü

The license plates look like this:

A*S-XXXPA
A*SL-XXXPC
A*SL-XXXSD
A*HA-XXXHV
D*R-XXXXX
D*TS-XXXXX
A*VB-1XXXXX

The RegExs I use for capturing the countries and the district are the following.

String country = "^([A-Z]{1,3})";
String district = "\\h*(\\p{L}{1,3})[-*]";

Once I get the needed information out of my Strings, I remove the information I don't need with java code, here's the code piece:

if (matcher.find()) {
        
        country_region = matcher.group(1);
        country_region = country_region.replace("*", "");
        country_region = country_region.replace("-", "");
        country_region = country_region.replaceAll("\\s $", "");            

    }

My regex capturing the countries works fine, here's an example:

The one I'm having troubles with is the RegEx I use to capture districts. At the moment it also catches the countries...

I guess I could just remove the asterisk at the end of my RegEx, but I do not think it's the cleanest way to do it.

Thank you!

CodePudding user response:

The main issue with the district regex is that \h* matches any zero or more horizontal whitespaces. So the match can also occur at the start of string.

Since you want to get a match after a horizontal whitespace, * or -, you can use

[*\h-](\p{L}{1,3})[-*]

See the regex demo. Here, [*\h-] matches a *, a horizontal whitespace or a - char.

However, it makes sense to use a regex to match the stirng while capturing all parts into groups:

^([A-Z]{1,3})[\h*-](\p{L}{1,3})[-*](. )

See this regex demo. Details:

  • ^ - start of string
  • ([A-Z]{1,3}) - Group 1: one, two or three uppercase letters
  • [\h*-] - a horizontal whitespace, * or -
  • (\p{L}{1,3}) - Group 2: one to three any Unicode letters
  • [-*] - a - or * char
  • (. ) - Group 3: all text till the end of string/line.
  • Related