I have the following regular expression /\.(?![^"]*"(?:(?:[^"]*"){2})*[^"]*$)(?=[^\.] )/g
to split strings by periods using javascript String.split
function, if the period is not within double quotes ""
and also if the period does not occur at the end of the string.
It seems to work well for the simple cases, like hello ."world . works".well.
yields ['hello ', '"world . works"', 'well.']
.
But I have this complex example 'test."One .word".A short sentence." no split. " no split.".'
where it splits incorrectly to ['test."One ', 'word".A short sentence." no split', ' " no split.".']
The expected output is ['test', '"One .word"', 'A short sentence', '" no split. " no split', '".']
I've run out of ideas how to fix this. Any help is greatly appreciated.
CodePudding user response:
Use
/(?:"[^"]*"|[^.]) (?:\. $)?/g
See regex proof.
JavaScript code:
const regex = /(?:"[^"]*"|[^.]) (?:\. $)?/g;
const str = `test."One .word".A short sentence." no split. " no split.".`;
console.log(str.match(regex));
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[^.] any character except: '.'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
(?: group, but do not capture (optional):
--------------------------------------------------------------------------------
\. '.' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
)? end of grouping