Home > Software design >  Matching multiple unicode characters in Golang Regexp
Matching multiple unicode characters in Golang Regexp

Time:05-22

As a simplified example, I want to get ^⬛ $ matched against ⬛⬛⬛ to yield a find match of ⬛⬛⬛.

    r := regexp.MustCompile("^⬛ $")
    matches := r.FindString("⬛️⬛️⬛️")
    fmt.Println(matches)

But it doesn't match successfully even though this would work with regular ASCII characters.

I'm guessing there's something I don't know about Unicode matching, but I haven't found any decent explanation in documentation yet.

Can someone explain the problem?

Go Play

CodePudding user response:

The regular expression matches a string containing one or more ⬛ (black square box).

The subject string is three pairs of black square box and variation selector-16. The variation selectors are invisible (on my terminal) and prevent a match.

Fix by removing the variation selectors from the subject string or adding the variation selector to the pattern.

Here's the first fix: https://go.dev/play/p/oKIVnkC7TZ1

  • Related