Home > database >  Regular expression to remove substring with known start and end pattern, with multiple words in betw
Regular expression to remove substring with known start and end pattern, with multiple words in betw

Time:11-15

This is the example substring I'm trying to remove:

On Feb 4, 2018 11:00 PM,

Problem is it can appear multiple times in a string. I just want to remove it/replace with nothing eg. ''.

You could do this a lazy way where you know the length of the pattern, substring it.

I have tried a regex like this:

str.replaceAll(/^On*PM,$/g, '')

Where the ^ and $ indicate start end... I'm missing the space/multiple words.

The months/time are dynamic but not that many combinations.

CodePudding user response:

Use

/\bOn\s (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept?|Oct|Nov|Dec)\s (?:0?[1-9]|[12]\d|3[01]),\s*\d{4}\s*(?:0?[0-9]|1[0-2]):[0-5]?[0-9]\s*[AP]M,/gi

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  On                       'On'
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    Jan                      'Jan'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Feb                      'Feb'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Mar                      'Mar'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Apr                      'Apr'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    May                      'May'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Jun                      'Jun'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Jul                      'Jul'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Aug                      'Aug'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Sep                      'Sep'
--------------------------------------------------------------------------------
    t?                       't' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Oct                      'Oct'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Nov                      'Nov'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    Dec                      'Dec'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    0?                       '0' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    [1-9]                    any character of: '1' to '9'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    [12]                     any character of: '1', '2'
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    3                        '3'
--------------------------------------------------------------------------------
    [01]                     any character of: '0', '1'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  ,                        ','
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \d{4}                    digits (0-9) (4 times)
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    0?                       '0' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    [0-9]                    any character of: '0' to '9'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    1                        '1'
--------------------------------------------------------------------------------
    [0-2]                    any character of: '0' to '2'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  [0-5]?                   any character of: '0' to '5' (optional
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  [0-9]                    any character of: '0' to '9'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  [AP]                     any character of: 'A', 'P'
--------------------------------------------------------------------------------
  M,                       'M,'

Or, simpler one if the strings you deal with are in good shape:

/\bOn\s \w \s \d{1,2},\s*\d{4}\s*\d{1,2}:\d{1,2}\s*[AP]M,/gi

See this proof.

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  On                       'On'
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \w                       word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \d{1,2}                  digits (0-9) (between 1 and 2 times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  ,                        ','
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \d{4}                    digits (0-9) (4 times)
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \d{1,2}                  digits (0-9) (between 1 and 2 times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \d{1,2}                  digits (0-9) (between 1 and 2 times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  [AP]                     any character of: 'A', 'P'
--------------------------------------------------------------------------------
  M,                       'M,'
  • Related