Home > Software engineering >  split string based on words and highlighted portions with `^` sign
split string based on words and highlighted portions with `^` sign

Time:11-23

I have a string that has highlighted portions with ^ sign:

const inputValue = 'jhon duo ^has a car^ right ^we know^ that';

Now how to return an array which is splited based on words and ^ highlights, so that we return this array:

['jhon','duo', 'has a car', 'right', 'we know', 'that']

Using const input = inputValue.split('^'); to split by ^ and const input = inputValue.split(' '); to split by words is not working and I think we need a better idea.

How would you do this?

CodePudding user response:

You can use match with a regular expression:

const inputValue = 'jhon duo ^has a car^ right ^we know^ that';
const result = Array.from(inputValue.matchAll(/\^(.*?)\^|([^^\s] )/g),
                          ([, a, b]) => a || b);
console.log(result);

  • \^(.*?)\^ will match a literal ^ and all characters until the next ^ (including it), and the inner part is captured in a capture group
  • ([^^\s] ) will match a series of non-white space characters that are not ^ (a "word") in a second capture group
  • | makes the above two patterns alternatives: if the first doesn't match, the second is tried.
  • The Array.from callback will extract only what occurs in a capture group, so excluding the ^ characters.

CodePudding user response:

trincot's answer is good, but here's a version that doesn't use regex and will throw an error when there are mismatched ^:

function splitHighlights (inputValue) {
  const inputSplit = inputValue.split('^');
  let highlighted = true
  const result = inputSplit.flatMap(splitVal => {
    highlighted = !highlighted
    if (splitVal == '') {
      return [];
    } else if (highlighted) {
      return splitVal.trim();
    } else {
      return splitVal.trim().split(' ')
    }
  })
  if (highlighted) {
    throw new Error(`unmatched '^' char: expected an even number of '^' characters in input`);
  }
  return result;
}
console.log(splitHighlights('^jhon duo^ has a car right ^we know^ that'));
console.log(splitHighlights('jhon duo^ has^ a car right we^ know that^'));
console.log(splitHighlights('jhon duo^ has a car^ right ^we know^ that'));
console.log(splitHighlights('jhon ^duo^ has a car^ right ^we know^ that'));

CodePudding user response:

You can still use split() but capture the split-sequence to include it in the output.
For splitting you could use *\^([^^]*)\^ *| to get trimmed items in the results.

const inputValue = 'jhon duo ^has a car^ right ^we know^ that';

// filtering avoids empty items if split-sequence at start or end
let input = inputValue.split(/ *\^([^^]*)\^ *|  /).filter(Boolean);

console.log(input);

regex matches
*\^ any amount of space followed by a literal caret
([^^]*) captures any amount of non-carets
\^ * literal caret followed by any amount of space
| OR split at one or more spaces
  • Related