Home > other >  How can I capture specific characters before a delimiter using RegEx?
How can I capture specific characters before a delimiter using RegEx?

Time:10-04

I am trying to create two regular expressions to capture the needed characters of a European license plate.

I'm using two RegEx at the moment

  1. Capture the country
    • (^[A-Z]{1,3}[\\s])
  2. Capture the district
    • (([^\\s*-A-Z]{1,3}[*])|([^\\s*-A-Z]{1,3}[-]))

Here are examples of licenseplates formats I have:

D HG-ABCDE : Country should be D, District should be HG
A S-FGHIJ  : Country should be A, District should be S 
D AC-KLMNO : Country should be D, District should be AC
A BR-PQRST : Country should be A, District should be BR
A RO*UVWXY : Country should be A, District should be RO

Once I get the needed information out of my Strings, I remove the information I don't need with java code, here's the code piece:

if (matcher.find()) {
        
        country_region = matcher.group(1);
        country_region = country_region.replace("*", "");
        country_region = country_region.replace("-", "");
        country_region = country_region.replaceAll("\\s $", "");            

    }

Now that I have explained my topic, here's the problem I am having. My District RegEx does not work as intended, it selects the wrong letters and that results in a wrong mapping from my side afterwards. I unfortunately could not find my error and here I am asking for help!

How may I rewrite my District RegEx to retrieve the letters after the space separating the country from the district but before the delimiters * or - ?

Thank you very much!

CodePudding user response:

The character class [^\\s*-A-Z] is a negated character class which matches any char except the listed. This part \\s*-A denotes a range from ASCII decimal number 42-65, and will also not match the A char.

If you would change it into [\s*A-Z-]{1,3} it would still match a space and a hyphen matching too much.


You could use 2 capture groups where the Country is in group 1 and the District is in group 2.

To match either - or * you can use a character class [-*]

^([A-Z]{1,3})\h ([A-Z]{1,3})[-*][A-Z]{5}$

The pattern matches:

  • ^ Start of string
  • ([A-Z]{1,3}) Capture 1-3 occurrences of A-Z in group 1
  • \h Match 1 occurrences of a horizontal whitespace char
  • ([A-Z]{1,3}) Capture 1-3 occurrences of A-Z in group 2
  • [-*][A-Z]{5} Match either - or * and 5 occurrences of A-Z
  • $ End of string

In Java

String regex = "^([A-Z]{1,3})\\h ([A-Z]{1,3})[-*][A-Z]{5}$";

See a regex demo

Only capturing the first 2 groups without the chars at the end and starting with a word boundary:

\b([A-Z]{1,3})\h ([A-Z]{1,3})[-*]

Regex demo

  • Related