I have a type bytes
file loaded in memory. How can I create identical files, as-if I was loading from my disk with open
?
Consider the following:
type(downloaded_bytes) # bytes
f = io.StringIO(downloaded_bytes.decode('utf-8')).readlines()
f2 = open(r"file.log", "r").readlines()
f == f2 # false
The large thing I noticed inspecting the files is that retrieving the file as bytes has replaced linebreaks. For example, in f2
, a line reads like this:
'Initializing error logging... Success\n',
While in the bytes derived file, f
, the same line reads:
'Initializing error logging... Success\r\n',
In other areas, \n
(the expected line break), is replaced by \r
in the bytes file.
How might I force f
to be exactly like f2
here?
CodePudding user response:
If you want to disable line ending translations, while still operating on str
, the correct solution is to pass newline=''
(or newline=""
) to open
. It still decodes the input to str
, and recognizes any form of line separator (\r\n
, \n
or \r
) as a line break, but it doesn't normalize the line separator to a simple \n
:
with open(r"file.log", newline='') as f2in: # Demonstrating with with statement for guarantee close
f2 = f2in.readlines()
Alternatively, to get rid of the \r
in the downloaded bytes rather than preserving it in the file read from disk, the simplest solution is to just perform the line-ending translation yourself (adding import os
to top of file if needed to get os.linesep
definition):
f = io.StringIO(downloaded_bytes.decode('utf-8').replace(os.linesep, '\n')).readlines()
CodePudding user response:
You're running on Windows. Which, by convention, uses '\r\n'
to terminate lines in "text mode". Open your file in "binary mode" instead:
f2 = open(r"file.log", "rb").readlines()
Note the trailing b
in the second argument to open()
. Then line-end translations won't happen.
CodePudding user response:
Well, don't use StringIO for binary stuff, use BytesIO!
from io import BytesIO
f = BytesIO(downloaded_bytes)