Home > Software engineering >  Regexp- replace specific line break in String
Regexp- replace specific line break in String

Time:11-05

I am seeking for a regexp that finds a specific line break \n from a long String.

The specific \n is the one before a line that do not contains a specific char: '#'

As example:

This tis a fine #line1\nThis tis another fine #line2\nThis_belongs_to abobe line\nThis tis still is OK #line4

that represents the text:

this tis a fine #line1
this tis another fine #line2
this_belongs_to abobe line
this tis still is OK #line4

here the \n to be removed in the one after #line2, resulting in the text:

this tis a fine #line1
this tis another fine #line2this_belongs_to abobe line
this tis still is OK #line4

I came up with a regexp like: \n^(?m)(?!.*#).*$ that is close, but I can't figure out how to build the right one that allows me to match and remove only the right line break and preserve the remaining text/String.

Perhaps there is a better way than using regular expression?

CodePudding user response:

You can use

text = text.replaceAll("\\R(?!.*#)", "");
text = text.replaceAll("(?m)\\R(?=[^\n#] $)", "");

See the regex demo / regex demo #2. Details:

  • (?m) - Pattern.MULTILINE embedded flag option to make $ in this pattern match end of a line, not the end of the whole string
  • \R - any line break sequence
  • (?!.*#) - a negative lookahead that matches a location not immediately followed with any zero or more chars other than line break chars as many as possible and then a # char
  • (?=[^\n#] $) - a positive lookahead that requires one or more chars (replace with * to match an empty line, too) other than an LF and # up to an end of a line.

See the Java demo online:

String s_lf = "this tis a fine #line1\nthis tis another fine #line2\nthis_belongs_to abobe line\nthis tis still is OK #line4";
String s_crlf = "this tis a fine #line1\r\nthis tis another fine #line2\r\nthis_belongs_to abobe line\r\nthis tis still is OK #line4";
 
System.out.println(s_lf.replaceAll("\\R(?!.*#)", "")); 
System.out.println(s_crlf.replaceAll("\\R(?!.*#)", ""));
 
System.out.println(s_lf.replaceAll("(?m)\\R(?=[^\n#] $)", "")); 
System.out.println(s_crlf.replaceAll("(?m)\\R(?=[^\n#] $)", "")); 

All test cases - with strings having CRLF and LF line endings - result in

this tis a fine #line1
this tis another fine #line2this_belongs_to abobe line
this tis still is OK #line4
  • Related