Home > database >  UTF8 characters incorrect after selecting from postgres
UTF8 characters incorrect after selecting from postgres

Time:11-29

I have a database table in postgres containing email addresses. One of the customers has an umlaut (ü) in their email address. This shouldn't be an issue, but somehow the string in go contains the wrong byte sequence (it's E3BC instead of C3BC) which later on gives me a bunch of problems.

I'm connecting to the database with client_encoding=UTF8 and the database is set up for UTF8. If I run the following I can see the byte sequence is as expected in the database:

SELECT encode("email"::bytea, 'hex') FROM participants WHERE  email like 'XXXXXX%';
                    encode                    
----------------------------------------------
                     c3bc                  

(the rest of the data has been hidden)

I read the data using the database/sql package and the postgres driver and if I print the string in go I get XXXXXXe3bcXXXXXX which is not what I expect (again, hiding the rest of the email with X's).

Is this a bug, or am I misunderstanding something?

CodePudding user response:

Make sure your database is correctly set up for UTF8. The locale settings are fixed when creating the database and might cause issues with sql functions like LOWER. Re-create the database with pg_dropcluster and pg_createcluster --encoding=UTF8.

  • Related