In Postgres I have column of json type with UTF-8 encoding. I am inserting json from Python with Psycopg2
cursor.execute("INSERT INTO my_table (json_data) VALUES (%s)",Json(data))
The result in the database is
{"item": "r\u00f6b"}
Meaning ö appears as \u00f6. When I manually insert this item through pgAdmin, it appears normally as ö. The documentation says that psycopg2.extras.Json uses dumps inside, which has this encoding issue. Could you suggest a solution to this to allow storing {"item": "röb"}
without \u ?
CodePudding user response:
This is due to json.dumps
having the default of ensure_ascii=True
per json:
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is. ...
As permitted, though not required, by the RFC, this module’s serializer sets ensure_ascii=True by default, thus escaping the output so that the resulting strings only contain ASCII characters.
The solution is to override the default by creating a lambda function that uses dumps(ensure_ascii=False)
. Derived from examples here psycopg2 json.
import json
from psycopg2.extras import Json
json.dumps('ö')
'"\\u00f6"'
json.dumps('ö', ensure_ascii=False)
'"ö"'
dumps_lambda = lambda x: json.dumps(x, ensure_ascii=False)
j = Json('ö', dumps=dumps_lambda)
j.adapted
'ö'