I am writing a script to convert HTML to AMP. And have this code:
#!/usr/bin/python3
import argparse
from amp_tools import TransformHtmlToAmp
import codecs
arg_parser = argparse.ArgumentParser( description = "Copy source_file as target_file." )
arg_parser.add_argument( "source_file" )
arg_parser.add_argument( "target_file" )
arguments = arg_parser.parse_args()
source = arguments.source_file
target = arguments.target_file
html = ""
with codecs.open(source, encoding='utf-8', mode='r ') as f:
for line in f:
html = html line.rstrip()
valid_amp = str(TransformHtmlToAmp(html)())
with codecs.open(target, encoding='utf-8', mode='w ') as f:
f.write(valid_amp.rstrip())
f.seek(0)
#print(str(valid_amp))
print( target, "successfully created !!" )
Now, this works but the file is saved is enclosed in b''
. I don't want that. Is there way to avoide quotes in the output file?
Sample input: <!doctype html> <html lang="en"> <head> <title>News Article</title> <link href="base.css" rel="stylesheet" /> <script type="text/javascript" src="base.js"></script> </head> <body> <header> News Site </header> <article> <h1>Article Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam egestas tortor sapien, non tristique ligula accumsan eu.</p> </article> <img src="https://www.travelmanagers.com.au/wp-content/uploads/2012/08/AdobeStock_254529936_Railroad-to-Denali-National-Park-Alaska_750x500.jpg"> </body> </html>
Output: b'<div lang="en" > <head> <title>News Article</title> <link href="base.css" rel="stylesheet"> <script type="text/javascript" src="base.js"></script> </head> <body> <header> News Site </header> <article> <h1>Article Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam egestas tortor sapien, non tristique ligula accumsan eu.</p> </article> <amp-img src="https://www.travelmanagers.com.au/wp-content/uploads/2012/08/AdobeStock_254529936_Railroad-to-Denali-National-Park-Alaska_750x500.jpg" width="750" height="500" layout="responsive"></amp-img> </body></div>'
CodePudding user response:
You should substitute the row:
valid_amp = str(TransformHtmlToAmp(html)())
with:
valid_amp = bytes(TransformHtmlToAmp(html)()).decode("utf-8")