I have this sample sentences in a plain text file (utf-8):
today is an interest-
ing day
the "-" on the first line is followed by only \n
(I have already stripped all \r
from the file, to deal uniformy with different sources)
I would like to wrap the 2 lines in 1 line, because of the "-", meaning that the preceding word has been truncated and is continuing in the next line.
To join this kind of lines, what I have tried is something along the lines:
text.replace(/[\n-]/g, "")
but does not seem to be working. What is the right way to achieve this ?
I would like to be able to deal with both these possible endings (or similar situations you might anticipate):
interest-\n
interest- \n (possible blanks inserted before \n)
CodePudding user response:
You can use
text.replace(/\b-\s*\n\b/g, "")
text.replace(/\b-[^\S\r\n]*\n\b/g, "")
See the regex demo. Details:
\b
- a word boundary-
- a hyphen\s*
- zero or more whitespaces /[^\S\r\n]*
- zero or more horizontal whitespaces (supporting CRLF, CR and LF endings)\n
- a newline char\b
- a word boundary.
See the JavaScript demo:
console.log( "today is an interest- \ning day".replace(/\b-\s*\n\b/g, "") );
console.log( "today is an interest-\ning day".replace(/\b-\s*\n\b/g, "") );
A Unicode-aware pattern that checks for just letters on both ends can look like text.replace(/(?<=\p{L}\p{M}*)-[^\S\r\n]*\n(?=\p{L})/gu, "")
, where (?<=\p{L}\p{M}*)
checks for a letter optional diacritics before -
and (?=\p{L})
checks for a letter after a newline. See the regex demo.
CodePudding user response:
There are three things wrong in your regex for this use:
You have the new line before the -
The [] means a list of characters to match at least one of them
You need to add \s to match whitespace
So try this:
text.replace(/-\s*\n/g, "")