Home > Software engineering >  Split paragraph at every instance of a number?
Split paragraph at every instance of a number?

Time:09-23

I have paragraph text like:

"2 lb thawed chicken breasts1/2 cup jarred red salsa try and choose a basic 2 carbs/2 tbs option- we love the double roasted from Trader Joes5 cloves garlic minced I use the frozen dorot cubes every time!1 tbs liquid smoke sold near bottled barbecue sauce at any normal store1-2 tbs chipotle peppers in adobo sauce."

That I would like to look like this

  • 2 lb thawed chicken breasts
  • 1/2 cup jarred red salsa try and choose a basic 2 carbs/2 tbs option- we love the double roasted from Trader Joes
  • 5 cloves garlic minced I use the frozen dorot cubes every time!
  • 1-2 tbs chipotle peppers in adobo sauce.

I understand it might be impossible to distringuish between the serving amounts and just random numbers in the text but I can go through and correct those instances.

I have tried the following:

function makeList(text){
    let textString = new String(text)
    let newText = textString.split(/(?=\d (\/\d |\.\d ))/g)
    return newText
}

but it doesn't seem to split the paragraph correctly.

Thank you for the help!

CodePudding user response:

If it is not a must to match the right number form with the right unit, you might get all the matches using:

\d (?:[\/-]\d )? (?:lb|cup|cloves|tbs).*?(?=\d (?:[\/-]\d )? (?:lb|cup|cloves|tbs)|$)

The pattern in parts matches:

  • \d (?:[\/-]\d )? Match 1 digits followed by an optional part that matches either / or - and 1 digits
  • (?:lb|cup|cloves|tbs) Match a space and any of the alternatives
  • .*? Match as least as possible chars
  • (?= Positive lookahead, assert what is to the right it
    • \d (?:[\/-]\d )? (?:lb|cup|cloves|tbs) The number and unit pattern
    • | Or
    • $ End of string
  • ) Close the lookahead

Regex demo

const regex = /\d (?:[\/-]\d )? (?:lb|cup|cloves|tbs).*?(?=\d (?:[\/-]\d )? (?:lb|cup|cloves|tbs)|$)/gm;
const str = `2 lb thawed chicken breasts1/2 cup jarred red salsa try and choose a basic 2 carbs/2 tbs option- we love the double roasted from Trader Joes5 cloves garlic minced I use the frozen dorot cubes every time!1 tbs liquid smoke sold near bottled barbecue sauce at any normal store1-2 tbs chipotle peppers in adobo sauce.`;
console.log(str.match(regex))

CodePudding user response:

It seems that the numbers you want to split on does NOT follow a space, a slash or a hyphen.

In that case you can use this regex:

/(?<![ \/-])(?=\d)/g

The regex starts by using a negative look behind for a space, a slash or a hyphen - meaning 'don't match on numbers following one of these. Of course you can add/remove other characters here.

It then uses a positive look ahead for a digit meaning it will match an empty string, used for splitting.

let text = '2 lb thawed chicken breasts1/2 cup jarred red salsa try and choose a basic 2 carbs/2 tbs option- we love the double roasted from Trader Joes5 cloves garlic minced I use the frozen dorot cubes every time!1 tbs liquid smoke sold near bottled barbecue sauce at any normal store1-2 tbs chipotle peppers in adobo sauce.';

let regex = /(?<![ \/-])(?=\d)/g;

let lines = text.split(regex);

console.log(lines);

  • Related