Home > Enterprise >  PHP: Split a string at the first period that isn't the decimal point in a price or the last cha
PHP: Split a string at the first period that isn't the decimal point in a price or the last cha

Time:03-11

I want to split a string as per the parameters laid out in the title. I've tried a few different things including using preg_match with not much success so far and I feel like there may be a simpler solution that I haven't clocked on to.

I have a regex that matches the "price" mentioned in the title (see below).

/(?=.)\£(([1-9][0-9]{0,2}(,[0-9]{3})*)|[0-9] )?(\.[0-9]{1,2})?/

And here are a few example scenarios and what my desired outcome would be:

Example 1:

input: "This string should not split as the only periods that appear are here £19.99 and also at the end."
output: n/a

Example 2:

input: "This string should split right here. As the period is not part of a price or at the end of the string."
output: "This string should split right here"

Example 3:

input: "There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price"
output: "There is a price in this string £19.99, but it should only split at this point"

CodePudding user response:

I suggest using

preg_split('~\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9] )?(?:\.\d{1,2})?(*SKIP)(*F)|\.(?!\s*$)~u', $string)

See the regex demo.

The pattern matches your pattern, \£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9] )?(?:\.\d{1,2})? and skips it with (*SKIP)(*F), else, it matches a non-final . with \.(?!\s*$) (even if there is trailing whitespace chars).

If you really only need to split on the first occurrence of the qualifying dot you can use a matching approach:

preg_match('~^((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9] )?(?:\.\d{1,2})?|[^.]) )\.(.*)~su', $string, $match)

See the regex demo. Here,

  • ^ - matches a string start position
  • ((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9] )?(?:\.\d{1,2})?|[^.]) ) - one or more occurrences of your currency pattern or any one char other than a . char
  • \. - a . char
  • (.*) - Group 2: the rest of the string.

CodePudding user response:

You could simply use this regex:

\. Since you only have a space after the first sentence (and not a price), this should work just as well, right?

CodePudding user response:

To split a text into sentences avoiding the different pitfalls like dots or thousand separators in numbers and some abbreviations (like etc.), the best tool is intlBreakIterator designed to deal with natural language:

$str = 'There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price';

$si = IntlBreakIterator::createSentenceInstance('en-US');
$si->setText($str);
$si->next();

echo substr($str, 0, $si->current());

IntlBreakIterator::createSentenceInstance returns an iterator that gives the indexes of the different sentences in the string.

It takes in account ?, ! and ... too. In addition to numbers or prices pitfalls, it works also well with this kind of string:

$str = 'John Smith, Jr. was running naked through the garden crying "catch me! catch me!", but no one was chasing him. His psychatre looked at him from the window with a circumspect eye.';

More about rules used by IntlBreakIterator here.

  • Related