If given a binary string in python like
bstring = b'hello'
why does bstring[0]
return the ascii code for the char 'h' (104) and not the binary char b'h'
or b'\x68'
?
It's probably also good to note that b'h' == 104
returns False
(this cost me about 2 hours of debugging, so I'm a little annoyed)
CodePudding user response:
Because bytes are not characters.
It returns the value of the byte (as integer) that is sliced.
If you take 'hello', this is quite simple: 5 ASCII characters -> 5 bytes:
b'hello' == 'hello'.encode('utf-8')
# True
len('hello'.encode('utf-8'))
# 5
If you were to use non-ASCII characters, those could be encoded on several bytes and slicing could give you only part of a character:
len('å'.encode('utf-8'))
# 2
'å'.encode('utf-8')[0]
# 195
'å'.encode('utf-8')[1]
# 165
CodePudding user response:
Think of bytes
less as a “string” and more of an immutable list (or tuple
) with the constraints that all elements be integers in range(256)
.
So, think of:
>>> bstring = b'hello'
>>> bstring[0]
104
as being equivalent to
>>> btuple = (104, 101, 108, 108, 111)
>>> btuple[0]
104
except with a different sequence type.
It's actually str
that behaves weirdly in Python. If you index a str
, you don't get a char
object like you would in some other languages; you get another str
.
>>> string = 'hello'
>>> string[0]
'h'
>>> type(string[0])
<class 'str'>
>>> string[0][0]
'h'
>>> string[0][0][0]
'h'