How to replace '\\' with '\' without raising an EOL error?-CodePudding

I am reading from a file that contains byte data but when I open the file and store the readline data into a variable it stores it in a string with backslash escapes, So when trying to decode that data I am getting the exact string data and not actually decoding.

File Example:

b'\xe0\xa8\xaa\xe0\xa9\xb0\xe0\xa8\x9c\xe0\xa8\xbe\xe0\xa8\xac\xe0\xa9\x80'
b'\xd9\xbe\xd9\x86\xd8\xac\xd8\xa7\xd8\xa8\xdb\x8c'
b'\xd9\xbe\xda\x9a\xd8\xaa\xd9\x88'

readline returns:

"b'\\xe0\\xa8\\xaa\\xe0\\xa9\\xb0\\xe0\\xa8\\x9c\\xe0\\xa8\\xbe\\xe0\\xa8\\xac\\xe0\\xa9\\x80'"

I get why there is an extra backslash, but I don't know how to remove it or read the file without it.

I have tried to replace those double backslashes but that raises an EOL error.

CodePudding user response：

If you are using 'rb' on the file that would likely be the problem. It is still only a text file.

When I use no mode arguments I get "b'\xe0\xa8\xaa\xe0\xa9\xb0\xe0\xa8\x9c\xe0\xa8\xbe\xe0\xa8\xac\xe0\xa9\x80' "

CodePudding user response：

To convert the string representation of a bytes object to an actual bytes object, you could use ast.literal_eval().

>>> s = "b'\\xe0\\xa8\\xaa'"
>>> import ast
>>> b = ast.literal_eval(s)
>>> b
b'\xe0\xa8\xaa'
>>> b.decode('utf-8')
'ਪ'

Although, why does the file contain Python representations of bytes objects in the first place? Like, where is it coming from? If you're the one creating it, why not use the bytes themselves? Or you could at least use a standard serialization format like JSON.

Note that if the source of the file is untrusted and you use this approach, an attacker could crash your Python interpreter.