Convert bytes object to string object in python-CodePudding

python code

#!python3

import sys
import os.path
import codecs

if not os.path.exists(sys.argv[1]):
    print("File does not exist: "   sys.argv[1])
    sys.exit(1)
file_name = sys.argv[1]

with codecs.open(file_name, 'rb', errors='ignore') as file:
    file_contents = file.readlines()

for line_content in file_contents:
    print(type(line_content))
    line_content = codecs.decode(line_content)
    print(line_content)
    print(type(line_content))

File content : Log.txt

b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'

Output:

python3 file_convert.py Log.txt                                                                                                                                               ✔  19:08:22 
<class 'bytes'>
b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'
<class 'str'>

I tried all the below methods

line_content = line_content.decode('UTF-8')
line_content = line_content.decode()
line_content = codecs.decode(line_content, 'UTF-8')

Is there any other way to handle this?
The line_content variable still holds the byte data and only the type changes to str which is kind off confusing.

CodePudding user response：

The data in Log.txt is the string representation of a python Bytes object. That is odd but we can deal with it. Since its a Bytes literal, evaluate it, which converts it to a real python Bytes object. Now there is still a question of what its encoding is.

I don't see any advantage to using codecs.open. That's a way to read unicode files in python 2.7, not usually needed in python 3. Guessing UTF-8, your code would be

#!python3

import sys
import os
import ast

if not os.path.exists(sys.argv[1]):
    print("File does not exist: "   sys.argv[1])
    sys.exit(1)
file_name = sys.argv[1]

with open(file_name) as file:
    file_contents = file.readlines()

for line_content in file_contents:
    print(type(line_content))
    line_content = ast.literal_eval(line_content).decode("utf-8")
    print(line_content)
    print(type(line_content))

CodePudding user response：

I think it's a list not a string. Whenever you look at byte-string started with \ (reverse backslash), it's potentially a list

try this

decoded_line_content = list(line_content)