I'm using SubtitleEdit and I'd like to locate all the lines that do not contain a line break.
Because lines containing a line break indicates they are bilingual, which I want.
But those that do not have line breaks are mono-lingual, and I'd like to quickly locate them all and delete them. TIA!
Alternatively, if there is a regex expression that can find lines which do not contain any English characters, that would also work.
CodePudding user response:
You should use regex assert. Given test lines:
something_1
some<br>thing_2
something_3<br>
<br>something_4
something_5
This is an expression that will match lines 1 and 5
^(?!.*<br>).*$
In this regular expression we have the negative lookahead assertion (?!.*<br>) that allows us to define what line is suitable for us
CodePudding user response:
The confusion here was caused by 2 facts:
- What
SubtitleEdit
callsa line
is actually a multiline, containing newlines. - The newline displayed is not the one used internally (so it would never match
<br>
).
Solution 1:
Now that we have found out it uses either \r\n
or just \n
, we can write a regex:
(?-m)^(?!.*\r?\n)[\s\S]*$
Explanation:
(?-m)
- turn off the multiline
option (which is otherwise enabled).
^
- match from start of text
(?!.*\r?\n)
- negative look ahead
for zero or more of any characters followed by newline
character(s) - (=Contains)
[\s\S]*$
- match zero or more of ANY
character (including newline) - will match the rest of text.
In short: If we don't find newline
characters, match everything.
Now replace
with an empty string.
Solution 2:
If you want to match lines that doesn't have any English characters, you can use this:
(?-m)^(?![\s\S]*[a-zA-Z])[\s\S]*$
Explanation:
(?-m)
- turn off the multiline
option (which is otherwise enabled).
^
- match from start of text
(?![\s\S]*[a-zA-Z])
- negative look ahead
for ANY
characters followed by an English character.
[\s\S]*$
- match zero or more of ANY
character (including newline) - will match the rest of text.
In short: If we don't find an English character, match everything.
Now replace
with an empty string.