Home > Back-end >  Replacing \n while keeping \r\n intact
Replacing \n while keeping \r\n intact

Time:09-29

I have a huge CSV file (196244 line) where it has \n in place other than new lines, I want to remove those \n but keep \r\n intact. I've tried line.replace but seems like it is not recognizing \r\n so next I tried regex

with open(filetoread, "r") as inf:
    with open(filetowrite, "w") as fixed:
        for line in inf:
            line = re.sub("(?<!\r)\n", " ", line)
            fixed.write(line)

but it is not keeping \r\n it is removing everything. I can't do it in Notepad it is crashing on this file.

CodePudding user response:

You are not exposing the line breaks to the regex engine. Also, the line breaks are "normalized" to LF when using open with r mode, and to keep them all in the input, you can read the file in in the binary mode using b. Then, you need to remember to also use the b prefix with the regex pattern and replacement.

You can use

with open(filetoread, "rb") as inf:
    with open(filetowrite, "wb") as fixed:
        fixed.write(re.sub(b"(?<!\r)\n", b" ", inf.read()))

Now, the whole file will be read into a single string (with inf.read()) and the line breaks will be matched, and eventually replaced.

Pay attention to

  • "rb" when reading file in
  • "wb" to write file out
  • re.sub(b"(?<!\r)\n", b" ", inf.read()) contains b prefixes with string literals, and inf.read() reads in the file contents into single variable.

CodePudding user response:

When you open a file with a naive open() call, it will load a view of the file with a variety of newlines to be simply \n via TextIOWrapper

Explicitly setting newline="\r\n" should allow you to read and write the newlines the way you expect

with open(path_src, newline="\r\n") as fh_src:
    with open(path_dest, "w", newline="\r\n") as fh_dest:
        for line in fh_src:  # file-likes are iterable by-lines
            fh_dest.write(line[:-2].replace("\n", " "))
            fh_dest.write("\r\n")

content example

>>> with open("test.data", "wb") as fh:
...     fh.write(b"""foo\nbar\r\nbaz\r\n""")
...
14
>>> with open("test.data", newline="\r\n") as fh:
...     for line in fh:
...         print(repr(line))
...
'foo\nbar\r\n'
'baz\r\n'
  • Related