Home > Blockchain >  Identify double byte character in a string and convert that into a single byte character
Identify double byte character in a string and convert that into a single byte character

Time:10-03

In my Go project, I am dealing with asian languages and There are double byte characters. In my case, I have a string which contains two words and there is a space between them.

EG: こんにちは 世界

Now I need to check if that space is a double byte space and if so, I need to convert that into single byte space.

I have searched a lot, but I couldn't find a way to do this. Since I cannot figure out a way to do this, sorry I have no code sample to add here.

Do I need to loop through each character and pick the double byte space using its code and replace? What is the code I should use to identify double byte space?

CodePudding user response:

Just replace?

package main

import (
    "fmt"
    "strings"
)

func main()  {
    fmt.Println(strings.Replace("こんにちは 世界", " ", " ", -1))
}

Notice that the second argument in Replace is  , as copy-paste from your string in example. This replace function will find all rune that match that in the original string and replace it with ASCII space

CodePudding user response:

In golang there is nothing like double byte character. There is special type rune which is int32 under hood and rune is unicode representation.

your special space is 12288 and normal space is 32 unicode.

To iterate over characters you can use range

for _, char := range chars {...} // char is rune type

To replace this character you can use strings.Replace or strings.Map and define function for replacement of unwanted characters.

func converter(r rune) rune {
    if r == 12288 {
        return 32
    }
    return r
}
result := strings.Map(converter, "こんにちは 世界")

It is also posible to use characters literals instead of numbers

if r == ' ' {
    return ' '
}
  • Related