Home > Software engineering >  How can I match a string containing ä (a with umlaut) using regex in Dart/Flutter?
How can I match a string containing ä (a with umlaut) using regex in Dart/Flutter?

Time:05-05

I am trying to match the string "06 März 2021" and I am trying with the regex:

r"(\d{1,2})\W(\p{L}{3,20})\W(\d{4})"

I tried telling the Regex to use Unicode:

RegExp(datePattern, unicode: true);

But that doesn't work for ä. It does for some other accented characters though.

Help would be appreciated. Thanks.

CodePudding user response:

Debugging showed me that the ä is treated as 2 characters, an a followed by the umlaut mark.

Because the following 2 strings are not identical (unless stackoverflow messes with the text I type):

März
März

In the first case the ä is composed from 2 characters, the a and the umlaut. In the 2nd, it's a single character. This can be checked by printing the lengths of the 2 strings (first is 5, second is 4).

After finding this link: https://www.regular-expressions.info/unicode.html#category

I realized that I needed to add the mark class of characters to the regex, so what I ended up with is:

r"(\d{1,2})\s([\p{L}\p{M}]{3,20})\s(\d{4})"

An alternative would be using canonical decomposition followed by canonical composition on the string using https://pub.dev/packages/unorm_dart

This would turn the 2nd string into the first (use single character for ä instead of 2).

NOTE: This applies to letters with umlauts, but I don't know to what other accented letters it might work for.

edit: replaced \W in the regex with \s so it only matches space characters (as suggested by The fourth bird)

  • Related