Home > database >  Python Regex to find CRLF
Python Regex to find CRLF

Time:11-13

I'm trying to write a regex that will find any CRLF in python.
I am able to successfully open the file and use newlines to determine what newlines its using CRLF or LF. My numerous regex attempts have failed

with open('test.txt', 'rU') as f:
   text = f.read()
   print repr(f.newlines)
   regex = re.compile(r"[^\r\n] ", re.MULTILINE)
   print(regex.match(text))

I've done numerous iterations on the regex and in every case it till either detect \n as \r\n or not work at all.

CodePudding user response:

You could try using the re library to search for the \r & \n patterns.

import re

with open("test.txt", "rU") as f:
    for line in f:
        if re.search(r"\r\n", line):
            print("Found CRLF")
            regex = re.compile(r"\r\n")
            line = regex.sub("\n", line)
        if re.search(r"\r", line):
            print("Found CR")
            regex = re.compile(r"\r")
            line = regex.sub("\n", line)
        if re.search(r"\n", line):
            print("Found LF")
            regex = re.compile(r"\n")
            line = regex.sub("\n", line)
        print(line)

Assuming your test.txt file looks something like this:

This is a test file
with a line break
at the end of the file.

CodePudding user response:

As I mentioned in a comment, you're opening the file with universal newlines, which means that Python will automatically perform newline conversion when reading from or writing to the file. Your program therefore will not see CR-LF sequences; they will be converted to just LF.

Generally, if you want to portably observe all bytes from a file unchanged, then you must open the file in binary mode:

In Python 2:

from __future__ import print_function
import re

with open('test.txt', 'rb') as f:
   text = f.read()

regex = re.compile(r"[^\r\n] ", re.MULTILINE)
print(regex.match(text))

In Python 3:

import re

with open('test.txt', 'rb') as f:
   text = f.read()

regex = re.compile(rb"[^\r\n] ", re.MULTILINE)
print(regex.match(text))
  • Related