Why does indexing a binary string return an integer in python3?-CodePudding

If given a binary string in python like

bstring = b'hello'

why does bstring[0] return the ascii code for the char 'h' (104) and not the binary char b'h' or b'\x68'?

It's probably also good to note that b'h' == 104 returns False (this cost me about 2 hours of debugging, so I'm a little annoyed)

CodePudding user response：

Because bytes are not characters.

It returns the value of the byte (as integer) that is sliced.

If you take 'hello', this is quite simple: 5 ASCII characters -> 5 bytes:

b'hello' == 'hello'.encode('utf-8')
# True

len('hello'.encode('utf-8'))
# 5

If you were to use non-ASCII characters, those could be encoded on several bytes and slicing could give you only part of a character:

len('å'.encode('utf-8'))
# 2

'å'.encode('utf-8')[0]
# 195

'å'.encode('utf-8')[1]
# 165

CodePudding user response：

Think of bytes less as a “string” and more of an immutable list (or tuple) with the constraints that all elements be integers in range(256).

So, think of:

>>> bstring = b'hello'
>>> bstring[0]
104

as being equivalent to

>>> btuple = (104, 101, 108, 108, 111)
>>> btuple[0]
104

except with a different sequence type.

It's actually str that behaves weirdly in Python. If you index a str, you don't get a char object like you would in some other languages; you get another str.

>>> string = 'hello'
>>> string[0]
'h'
>>> type(string[0])
<class 'str'>
>>> string[0][0]
'h'
>>> string[0][0][0]
'h'