I am updating some Python2 code written by others, and this part:
def exec(self, content, query):
# query = "city_68"
content = content.strip().strip(',').decode('utf-8', 'ignore')
query = query.decode('utf-8', 'ignore')
query_list = query.split('|')
This gives an error in Python3:
File "/Users/cong/bexec.py", line 708, in bexec
content = content.strip().strip(',').decode('utf-8', 'ignore')
AttributeError: 'str' object has no attribute 'decode'
The parameters content and query are both strings. So I removed the decode part:
content = content.strip().strip(',')
# query = query.decode('utf-8', 'ignore')
Now it doesn't complain any more. Is this safe to do? I guess in Python3 it doesn't need decode() any more.
CodePudding user response:
Correct. In Python 3, if you have a str
value, you can assume it is a proper sequence of Unicode code points, not a sequence of bytes that need to be decoded from (say) UTF-8 to a Unicode string. If you have a bytes
value, you must decode it first in order to get a proper Unicode string.
In Python 2, the boundaries were looser. A unicode
value was definitely a proper Unicode string (and was renamed str
in Python 3), while a str
value could be a "real" ASCII-only string value or arbitrary binary data: you couldn't tell just from the type.
As such, the str
type supported encode
and decode
methods to allow switching between the two sides of the str
type.
In Python 3, with more strictly defined roles, you can call str.encode
to get a bytes
value, or you can call bytes.decode
to get a str
value. You cannot decode a str
or further encode a bytes
. str.decode
and bytes.encode
simply do not exist.
In some sense, all files are binary files: they consist of a stream of bytes. What we call a text file is just a file whose bytes are intended to be decoded using a particular text decoder, like ASCII or UTF-8, as opposed to something like a JPEG decoder, or a JVM, or your CPU itself.
When you use open
to open a file in text mode (the default), its read
method returns str
values, resulting from applying file object's decoder to the raw bytes read from the file.
When you use open
to open a file in binary mode, its read
method returns bytes
values, the raw bytes being left undecoded for you to handle as you see fit.