The following regex (taken from here) splits a string by characters length (e.g. 20 characters), while being word-aware (live demo):
\b[\w\s]{20,}?(?=\s)|. $
This means that if a word should be "cut" in the middle (based on the provided characters length) - then the whole word is taken instead:
const str = "this is an input example of one sentence that contains a bit of words and must be split"
const substringMaxLength = 20;
const regex = new RegExp(`\\b[\\w\\s]{${substringMaxLength},}?(?=\\s)|. $`, 'g');
const substrings = str.match(regex);
console.log(substrings);
However, as can be seen when running the snippet above, the leading whitespace is taken with each substring. Can it be ignored, so that we'll end up with this?
[
"this is an input example",
"of one sentence that",
"contains a bit of words",
"and must be split"
]
I tried adding either [^\s]
, (?:\s)
, (?!\s)
everywhere, but just couldn't achieve it.
How can it be done?
CodePudding user response:
You can require that every match starts with \w
-- so for both options of your current regex:
const str = "this is an input example of one sentence that contains a bit of words and must be split"
const substringMaxLength = 20;
const regex = new RegExp(`\\b\\w(?:[\\w\\s]{${substringMaxLength-1},}?(?=\\s)|.*$)`, 'g');
const substrings = str.match(regex);
console.log(substrings);
CodePudding user response:
This is how you can do it:
const regex = new RegExp(
The regex uses a non-capturing group with a positive lookahead \\b((?:[^\\s] \\s?){${substringMaxLength},}?)(?=\\s)|. $
, 'g');
(?=\s)
to prevent whitespace from being captured.
The lookahead checks if there is a whitespace after the group and if there is a whitespace it returns a match.
The non-capturing group uses a positive lookbehind (?<=\s)
to make sure that the group starts with a whitespace.
\b((?:[^\s] \s?){20,}?)\b(?=\s)
Regex Demo
CodePudding user response:
Your pattern can start with a word character and the length minus 1.
The negative lookahead (?!\S)
asserts a whitespace boundary to the right.
The alternative matches the rest of the line, and also starta with a word character.
\b\w(?:[\w\s]{19,}?(?!\S)|.*)
const str = "this is an input example of one sentence that contains a bit of words and must be split"
const substringMaxLength = 20;
const regex = new RegExp(`\\b\\w(?:[\\w\\s]{${substringMaxLength-1},}?(?!\\S)|.*)`, 'g');
const substrings = str.match(regex);
console.log(substrings);