Home > Net >  RegExp `Range out of order in character class`
RegExp `Range out of order in character class`

Time:11-23

I have this RegExp:

[\u{1f300}-\u{1f5ff}\u{1f900}-\u{1f9ff}\u{1f600}-\u{1f64f}\u{1f680}-\u{1f6ff}\u{2600}-\u{26ff}\u{2700}-\u{27bf}\u{1f1e6}-\u{1f1ff}\u{1f191}-\u{1f251}\u{1f004}\u{1f0cf}\u{1f170}-\u{1f171}\u{1f17e}-\u{1f17f}\u{1f18e}\u{3030}\u{2b50}\u{2b55}\u{2934}-\u{2935}\u{2b05}-\u{2b07}\u{2b1b}-\u{2b1c}\u{3297}\u{3299}\u{303d}\u{00a9}\u{00ae}\u{2122}\u{23f3}\u{24c2}\u{23e9}-\u{23ef}\u{25b6}\u{23f8}-\u{23fa}\u{200d}]

When I'm using this RegExp on https://regex101.com it's working. But when I'm using this RegExp in JavaScript or Dart, I'm getting Range out of order in character class Error.

I'm really sure that this is a String escaping error, but I'm not able to find the problem.

I already tried a raw string in Dart (r"..."), escaping the \ in \u{1f300} --> \\u{1f300}.

CodePudding user response:

Unicode matching

As pointed out in the comments, matching unicode characters requires the unicode flag in regular expressions.

If you try to simply match a unicode character using RegExp('\u123'), this will fail for two reasons.

  1. You cannot have unicode characters in the regex. Instead, you need to escape them (e.g. by using a raw string): RegExp(r'\u123').
  2. This will still not work because now the regex attempts to evaluate every character in the string (so \, u, etc.). This is where the unicode flag comes into play: RegExp('\u123', unicode: true).

Note that for 3 byte unicode characters, you will need to add curly braces, e.g. RegExp(r'u\{1f300}'. See this question for more information.


This means that your final regex should look like this:

RegExp(
  r'[\u{1f300}-\u{1f5ff}\u{1f900}-\u{1f9ff}\u{1f600}-\u{1f64f}'
  r'\u{1f680}-\u{1f6ff}\u{2600}-\u{26ff}\u{2700}'
  r'-\u{27bf}\u{1f1e6}-\u{1f1ff}\u{1f191}-\u{1f251}'
  r'\u{1f004}\u{1f0cf}\u{1f170}-\u{1f171}\u{1f17e}'
  r'-\u{1f17f}\u{1f18e}\u{3030}\u{2b50}\u{2b55}'
  r'\u{2934}-\u{2935}\u{2b05}-\u{2b07}\u{2b1b}'
  r'-\u{2b1c}\u{3297}\u{3299}\u{303d}\u{00a9}'
  r'\u{00ae}\u{2122}\u{23f3}\u{24c2}\u{23e9}'
  r'-\u{23ef}\u{25b6}\u{23f8}-\u{23fa}\u{200d}] ',
  unicode: true,
);

Grapheme clusters

Another problem you might encounter is that you will not be able to match emojis that span multiple characters with the initial regex. Note that in the snippet above, I added a at the end in order to match emojis that span multiple characters.

In order to now match single emojis, you will want to run the regex on every single character in the string, defined by grapheme clusters. This can be achieved using package:characters.

An example implementation can be found here.

  • Related