Home > Software engineering >  Regular Expression to Remove Line Breaks from Specific Lines
Regular Expression to Remove Line Breaks from Specific Lines

Time:08-19

I have a text file with each paragraph on its own line. Some of the paragraphs got split at the start of a word. For example:

Books are an effective way to 
communicate across time, both from the past and into the future.

I could use regular expressions (regex), in the search and replace in Notepad or Geany, to search for a lower case letter, at the start of a line and replace the \r\n (carrage return line feed) with a space.
The problem is chapters have a subtitle that comes after the word "or" and the word "or" is on a line by itself. For example:

Chapter 3 
The Importance of Reading 
or
Literature is the most agreeable way of ignoring life

Using that method would put the "or" lines in the titles of the chapters instead of on its own line.

What I want is to tell regex if a line starts with a lower case letter match it (replacing the proceding \r\n with a space) but not if the line is "or\r\n".

CodePudding user response:

Looks like you could use lookarounds - Search for:

\h*\R(?=[a-z])(?!or$)

And replace with space. See this demo at regex101 (explanation on right side).

  • \h matches horitzontal space
  • \R matches any newline sequence
  • $ matches end of line (NP default)

In NP replace dialog make sure to check [•] Match case, [•] Wrap around.

  • Related