We are currently in the process of building a system that stores text in a PostgreSQL DB via Django. The data gets then extracted via PGSync to ElasticSearch.
At the moment we have encountered the following issue in a testcase
Error Message:
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 159-160: surrogates not allowed
We identified the character that causes that issue. It is an emoji.
The text itself is a mixture of Greek Characters, "English Characters" and as it seems emojis. The greek is not shown as greek, but instead in the \u
form.
Relevant Text that causes the issue:
\u03bc\u03b5 Some English Text \ud83d\ude9b\n#SomeHashTag
\ud83d\ude9b\
translates to this emoji: