Home > database >  Python: Bytes Object and Open -- How to create equivalent files?
Python: Bytes Object and Open -- How to create equivalent files?

Time:03-16

I have a type bytes file loaded in memory. How can I create identical files, as-if I was loading from my disk with open?

Consider the following:

type(downloaded_bytes)  # bytes
f = io.StringIO(downloaded_bytes.decode('utf-8')).readlines()
f2 = open(r"file.log", "r").readlines()
f == f2  # false

The large thing I noticed inspecting the files is that retrieving the file as bytes has replaced linebreaks. For example, in f2, a line reads like this:

'Initializing error logging... Success\n',

While in the bytes derived file, f, the same line reads:

'Initializing error logging... Success\r\n',

In other areas, \n (the expected line break), is replaced by \r in the bytes file.

How might I force f to be exactly like f2 here?

CodePudding user response:

If you want to disable line ending translations, while still operating on str, the correct solution is to pass newline='' (or newline="") to open. It still decodes the input to str, and recognizes any form of line separator (\r\n, \n or \r) as a line break, but it doesn't normalize the line separator to a simple \n:

with open(r"file.log", newline='') as f2in:  # Demonstrating with with statement for guarantee close
    f2 = f2in.readlines()

Alternatively, to get rid of the \r in the downloaded bytes rather than preserving it in the file read from disk, the simplest solution is to just perform the line-ending translation yourself (adding import os to top of file if needed to get os.linesep definition):

f = io.StringIO(downloaded_bytes.decode('utf-8').replace(os.linesep, '\n')).readlines()

CodePudding user response:

You're running on Windows. Which, by convention, uses '\r\n' to terminate lines in "text mode". Open your file in "binary mode" instead:

f2 = open(r"file.log", "rb").readlines()

Note the trailing b in the second argument to open(). Then line-end translations won't happen.

CodePudding user response:

Well, don't use StringIO for binary stuff, use BytesIO!

from io import BytesIO

f = BytesIO(downloaded_bytes)
  • Related