Home > other >  Convert bytes object to string object in python
Convert bytes object to string object in python

Time:04-20

python code

#!python3

import sys
import os.path
import codecs

if not os.path.exists(sys.argv[1]):
    print("File does not exist: "   sys.argv[1])
    sys.exit(1)
file_name = sys.argv[1]

with codecs.open(file_name, 'rb', errors='ignore') as file:
    file_contents = file.readlines()

for line_content in file_contents:
    print(type(line_content))
    line_content = codecs.decode(line_content)
    print(line_content)
    print(type(line_content))

File content : Log.txt

b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'

Output:

python3 file_convert.py Log.txt                                                                                                                                               ✔  19:08:22 
<class 'bytes'>
b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'
<class 'str'>

I tried all the below methods

line_content = line_content.decode('UTF-8')
line_content = line_content.decode()
line_content = codecs.decode(line_content, 'UTF-8')

Is there any other way to handle this?
The line_content variable still holds the byte data and only the type changes to str which is kind off confusing.

CodePudding user response:

The data in Log.txt is the string representation of a python Bytes object. That is odd but we can deal with it. Since its a Bytes literal, evaluate it, which converts it to a real python Bytes object. Now there is still a question of what its encoding is.

I don't see any advantage to using codecs.open. That's a way to read unicode files in python 2.7, not usually needed in python 3. Guessing UTF-8, your code would be

#!python3

import sys
import os
import ast

if not os.path.exists(sys.argv[1]):
    print("File does not exist: "   sys.argv[1])
    sys.exit(1)
file_name = sys.argv[1]

with open(file_name) as file:
    file_contents = file.readlines()

for line_content in file_contents:
    print(type(line_content))
    line_content = ast.literal_eval(line_content).decode("utf-8")
    print(line_content)
    print(type(line_content))

CodePudding user response:

I think it's a list not a string. Whenever you look at byte-string started with \ (reverse backslash), it's potentially a list

try this

decoded_line_content = list(line_content)
  • Related