Home > Enterprise >  Javascript regular expression split string by periods not in double quotes
Javascript regular expression split string by periods not in double quotes

Time:11-22

I have the following regular expression /\.(?![^"]*"(?:(?:[^"]*"){2})*[^"]*$)(?=[^\.] )/g to split strings by periods using javascript String.split function, if the period is not within double quotes "" and also if the period does not occur at the end of the string.

It seems to work well for the simple cases, like hello ."world . works".well. yields ['hello ', '"world . works"', 'well.'].

But I have this complex example 'test."One .word".A short sentence." no split. " no split.".' where it splits incorrectly to ['test."One ', 'word".A short sentence." no split', ' " no split.".']

The expected output is ['test', '"One .word"', 'A short sentence', '" no split. " no split', '".']

I've run out of ideas how to fix this. Any help is greatly appreciated.

CodePudding user response:

Use

/(?:"[^"]*"|[^.]) (?:\. $)?/g

See regex proof.

JavaScript code:

const regex = /(?:"[^"]*"|[^.]) (?:\. $)?/g;
const str = `test."One .word".A short sentence." no split. " no split.".`;
console.log(str.match(regex));
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (1 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    "                        '"'
--------------------------------------------------------------------------------
    [^"]*                    any character except: '"' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    "                        '"'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    [^.]                     any character except: '.'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional):
--------------------------------------------------------------------------------
    \.                       '.' (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )?                       end of grouping
  • Related