Home > front end >  Is it possible for str.encode(encoding='utf-8', errors='strict') to raise Unicod
Is it possible for str.encode(encoding='utf-8', errors='strict') to raise Unicod

Time:03-30

I am writing some code that needs to work with both Py2.7 and Py3.7 .

I need to write text to a file using UTF-8 encoding. My code looks like this:

import six
...
content = ...
if isinstance(content, six.string_types):
    content = content.encode(encoding='utf-8', errors='strict')

# write 'content' to file

Above, is it possible for content.encode() to raise UnicodeError from either Py2.7 or Py3.7 ? I cannot think of a scenario where this is possible. I am not a Python expert, so I think there there must be an edge case.

Here is my reasoning why I think it will never raise UnicodeError:

  • six.string_types covers three types: Py2.7 str & unicode, Py3.7 str
  • All of these types can always encode as UTF-8.

CodePudding user response:

Yes, it's possible:

import six

content = ''.join(map(chr, range(0x110000)))
if isinstance(content, six.string_types):
    content = content.encode(encoding='utf-8', errors='strict')

Result (Try it online!, using Python 3.7.4):

Traceback (most recent call last):
  File ".code.tio", line 5, in <module>
    content = content.encode(encoding='utf-8', errors='strict')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 55296-57343: surrogates not allowed

And UnicodeEncodeErrors are UnicodeErrors.

  • Related