I am writing some code that needs to work with both Py2.7 and Py3.7 .
I need to write text to a file using UTF-8 encoding. My code looks like this:
import six
...
content = ...
if isinstance(content, six.string_types):
content = content.encode(encoding='utf-8', errors='strict')
# write 'content' to file
Above, is it possible for content.encode()
to raise UnicodeError
from either Py2.7 or Py3.7 ? I cannot think of a scenario where this is possible. I am not a Python expert, so I think there there must be an edge case.
Here is my reasoning why I think it will never raise UnicodeError
:
six.string_types
covers three types: Py2.7str
&unicode
, Py3.7str
- All of these types can always encode as UTF-8.
CodePudding user response:
Yes, it's possible:
import six
content = ''.join(map(chr, range(0x110000)))
if isinstance(content, six.string_types):
content = content.encode(encoding='utf-8', errors='strict')
Result (Try it online!, using Python 3.7.4):
Traceback (most recent call last):
File ".code.tio", line 5, in <module>
content = content.encode(encoding='utf-8', errors='strict')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 55296-57343: surrogates not allowed
And UnicodeEncodeError
s are UnicodeError
s.