Home > Net >  What is the regex to find lines WITHOUT a line break
What is the regex to find lines WITHOUT a line break

Time:10-27

I'm using SubtitleEdit and I'd like to locate all the lines that do not contain a line break.

Because lines containing a line break indicates they are bilingual, which I want.

But those that do not have line breaks are mono-lingual, and I'd like to quickly locate them all and delete them. TIA!

Alternatively, if there is a regex expression that can find lines which do not contain any English characters, that would also work.

CodePudding user response:

You should use regex assert. Given test lines:

something_1
some<br>thing_2
something_3<br>
<br>something_4
something_5

This is an expression that will match lines 1 and 5

^(?!.*<br>).*$

In this regular expression we have the negative lookahead assertion (?!.*<br>) that allows us to define what line is suitable for us

CodePudding user response:

The confusion here was caused by 2 facts:

  1. What SubtitleEdit calls a line is actually a multiline, containing newlines.
  2. The newline displayed is not the one used internally (so it would never match <br>).

Solution 1:

Now that we have found out it uses either \r\n or just \n, we can write a regex:

(?-m)^(?!.*\r?\n)[\s\S]*$

Explanation:

(?-m) - turn off the multiline option (which is otherwise enabled).

^ - match from start of text

(?!.*\r?\n) - negative look ahead for zero or more of any characters followed by newline character(s) - (=Contains)

[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.

In short: If we don't find newline characters, match everything.

Now replace with an empty string.

Solution 2:

If you want to match lines that doesn't have any English characters, you can use this:

(?-m)^(?![\s\S]*[a-zA-Z])[\s\S]*$

Explanation:

(?-m) - turn off the multiline option (which is otherwise enabled).

^ - match from start of text

(?![\s\S]*[a-zA-Z]) - negative look ahead for ANY characters followed by an English character.

[\s\S]*$ - match zero or more of ANY character (including newline) - will match the rest of text.

In short: If we don't find an English character, match everything.

Now replace with an empty string.

  • Related