Home > Software engineering >  regex php look ahead number
regex php look ahead number

Time:11-21

stackers!

I have been trying to figure this out for some time but no luck.
(.*?(?:\.|\?|!))(?: |$)

the above pattern is capturing and breaking all sentences in a paragraph with ending punctuation.
example

  1. Today is the greatest. You are the greatest.

The match comes back with three Match {
1.
Today is the greatest.
You are the greatest.
}

However I am trying to get it to not break when there is a number with a period and would like to see the following match instead:

Match {
1.Today is the greatest.
You are the greatest.
}

Thanks for your help in advance

CodePudding user response:

Use

.*?[.?!](?=(?<!\d\.)\s |\s*$)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  [.?!]                    any character of: '.', '?', '!'
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
      \d                       digits (0-9)
--------------------------------------------------------------------------------
      \.                       '.'
--------------------------------------------------------------------------------
    )                        end of look-behind
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
--------------------------------------------------------------------------------
  )                        end of look-ahead
  • Related