python code
#!python3
import sys
import os.path
import codecs
if not os.path.exists(sys.argv[1]):
print("File does not exist: " sys.argv[1])
sys.exit(1)
file_name = sys.argv[1]
with codecs.open(file_name, 'rb', errors='ignore') as file:
file_contents = file.readlines()
for line_content in file_contents:
print(type(line_content))
line_content = codecs.decode(line_content)
print(line_content)
print(type(line_content))
File content : Log.txt
b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'
Output:
python3 file_convert.py Log.txt ✔ 19:08:22
<class 'bytes'>
b'\x03\x00\x00\x00\xc3\x8a\xc3\xacRb\x00\x00\x00\x00042284899:ATBADSFASF:DSF456582:US\r\n1'
<class 'str'>
I tried all the below methods
line_content = line_content.decode('UTF-8')
line_content = line_content.decode()
line_content = codecs.decode(line_content, 'UTF-8')
Is there any other way to handle this?
The line_content variable still holds the byte data and only the type changes to str which is kind off confusing.
CodePudding user response:
The data in Log.txt
is the string representation of a python Bytes
object. That is odd but we can deal with it. Since its a Bytes
literal, evaluate it, which converts it to a real python Bytes
object. Now there is still a question of what its encoding is.
I don't see any advantage to using codecs.open
. That's a way to read unicode files in python 2.7, not usually needed in python 3. Guessing UTF-8, your code would be
#!python3
import sys
import os
import ast
if not os.path.exists(sys.argv[1]):
print("File does not exist: " sys.argv[1])
sys.exit(1)
file_name = sys.argv[1]
with open(file_name) as file:
file_contents = file.readlines()
for line_content in file_contents:
print(type(line_content))
line_content = ast.literal_eval(line_content).decode("utf-8")
print(line_content)
print(type(line_content))
CodePudding user response:
I think it's a list not a string. Whenever you look at byte-string started with \
(reverse backslash), it's potentially a list
try this
decoded_line_content = list(line_content)