How can I capture specific characters before a delimiter using RegEx?-CodePudding

I am trying to create two regular expressions to capture the needed characters of a European license plate.

I'm using two RegEx at the moment

Capture the country
- (^[A-Z]{1,3}[\\s])
Capture the district
- (([^\\s*-A-Z]{1,3}[*])|([^\\s*-A-Z]{1,3}[-]))

Here are examples of licenseplates formats I have:

D HG-ABCDE : Country should be D, District should be HG
A S-FGHIJ  : Country should be A, District should be S 
D AC-KLMNO : Country should be D, District should be AC
A BR-PQRST : Country should be A, District should be BR
A RO*UVWXY : Country should be A, District should be RO

Once I get the needed information out of my Strings, I remove the information I don't need with java code, here's the code piece:

if (matcher.find()) {
        
        country_region = matcher.group(1);
        country_region = country_region.replace("*", "");
        country_region = country_region.replace("-", "");
        country_region = country_region.replaceAll("\\s $", "");            

    }

Now that I have explained my topic, here's the problem I am having. My District RegEx does not work as intended, it selects the wrong letters and that results in a wrong mapping from my side afterwards. I unfortunately could not find my error and here I am asking for help!

How may I rewrite my District RegEx to retrieve the letters after the space separating the country from the district but before the delimiters * or - ?

Thank you very much!

CodePudding user response：

The character class [^\\s*-A-Z] is a negated character class which matches any char except the listed. This part \\s*-A denotes a range from ASCII decimal number 42-65, and will also not match the A char.

If you would change it into [\s*A-Z-]{1,3} it would still match a space and a hyphen matching too much.

You could use 2 capture groups where the Country is in group 1 and the District is in group 2.

To match either - or * you can use a character class [-*]

^([A-Z]{1,3})\h ([A-Z]{1,3})[-*][A-Z]{5}$

The pattern matches:

^ Start of string
([A-Z]{1,3}) Capture 1-3 occurrences of A-Z in group 1
\h Match 1 occurrences of a horizontal whitespace char
([A-Z]{1,3}) Capture 1-3 occurrences of A-Z in group 2
[-*][A-Z]{5} Match either - or * and 5 occurrences of A-Z
$ End of string

In Java

String regex = "^([A-Z]{1,3})\\h ([A-Z]{1,3})[-*][A-Z]{5}$";

See a regex demo

Only capturing the first 2 groups without the chars at the end and starting with a word boundary:

\b([A-Z]{1,3})\h ([A-Z]{1,3})[-*]

Regex demo