Python Regex to find CRLF-CodePudding

I'm trying to write a regex that will find any CRLF in python.
I am able to successfully open the file and use newlines to determine what newlines its using CRLF or LF. My numerous regex attempts have failed

with open('test.txt', 'rU') as f:
   text = f.read()
   print repr(f.newlines)
   regex = re.compile(r"[^\r\n] ", re.MULTILINE)
   print(regex.match(text))

I've done numerous iterations on the regex and in every case it till either detect \n as \r\n or not work at all.

CodePudding user response：

You could try using the re library to search for the \r & \n patterns.

import re

with open("test.txt", "rU") as f:
    for line in f:
        if re.search(r"\r\n", line):
            print("Found CRLF")
            regex = re.compile(r"\r\n")
            line = regex.sub("\n", line)
        if re.search(r"\r", line):
            print("Found CR")
            regex = re.compile(r"\r")
            line = regex.sub("\n", line)
        if re.search(r"\n", line):
            print("Found LF")
            regex = re.compile(r"\n")
            line = regex.sub("\n", line)
        print(line)

Assuming your test.txt file looks something like this:

This is a test file
with a line break
at the end of the file.

CodePudding user response：

As I mentioned in a comment, you're opening the file with universal newlines, which means that Python will automatically perform newline conversion when reading from or writing to the file. Your program therefore will not see CR-LF sequences; they will be converted to just LF.

Generally, if you want to portably observe all bytes from a file unchanged, then you must open the file in binary mode:

In Python 2:

from __future__ import print_function
import re

with open('test.txt', 'rb') as f:
   text = f.read()

regex = re.compile(r"[^\r\n] ", re.MULTILINE)
print(regex.match(text))

In Python 3:

import re

with open('test.txt', 'rb') as f:
   text = f.read()

regex = re.compile(rb"[^\r\n] ", re.MULTILINE)
print(regex.match(text))