Home > Net >  Split a string according to tags and text with regex
Split a string according to tags and text with regex

Time:11-29

I need to parse a string, which contains both text content and specific tags.

Expected result must be an array containing items, with separation between texts and tags.

An example of string to parse

There is user [[user-foo]][[/user-foo]] and user [[user-bar]]label[[/user-bar]].

Some informations:

  • user- tag is static.
  • Following part (foo or bar) is dynamic and can be any string.
  • Same for the text parts.
  • Tags can receive some text as child.

Expected result

[
  'There is user ',
  '[[user-foo]][[/user-foo]]',
  ' and user ',
  '[[user-bar]]label[[/user-bar]]',
  '.'
]

What I tried

Here is a regex I created:

/\[\[user-[^\]] ]][A-Za-z]*\[\[\/user-[^\]] \]\]/g

It's visible/editable here: https://regex101.com/r/ufwVV1/1

It identifies all tag parts, and returns two matches, related to the two tags I have. But, text content is not included. I don't know if this first approach is correct.

CodePudding user response:

Maybe there's a better solution in terms of efficiency... But at least, that works.

  1. Get the tags using regex
  2. Get the tags position (start/end) within the string
  3. Use those positions against the string

const string = "There is user [[user-foo]][[/user-foo]] and user [[user-bar]]label[[/user-bar]]."

// Get the tags using regex
const matches = string.match(/\[\[[a-z-\/] \]\]/g)
console.log(matches)

// Get the tags position (start/end) within the string
const matchPositions = matches.map((match) => ({start: string.indexOf(match), end: string.indexOf(match)   match.length}))
console.log(matchPositions)

// Use those positions against the string
let currentPos = 0
let result = []
for(let i=0; i<matchPositions.length; i =2){
  const position = matchPositions[i]
  const secondPosition  = matchPositions[i 1]
  
  // Get the substring in front of the current tag (if any)
  if(position.start !== currentPos){
    const firstSubString = string.slice(currentPos, position.start)
    if(firstSubString !== ""){
      result.push(firstSubString)
    }
  }
  
  // Get the substring from the opening tag start to the closing tag end
  result.push(string.slice(position.start, secondPosition.end))
  currentPos = secondPosition.end
  
  // Get the substring at the end of the string (if any)
  if(i === matchPositions.length-2){
    const lastSubString = string.slice(secondPosition.end)
    if(lastSubString !== ""){
      result.push(lastSubString)
    }
    
  }
}

console.log(result)

CodePudding user response:

Here is my solution, inspired from @louys-patrice-bessette answer.

const string = 'There is user [[user-foo]][[/user-foo]] and user [[user-bar]]label[[/user-bar]].';
const regex = /\[\[user-[^\]] \]\][A-Za-z0-9_ ]*\[\[\/user-[^\]] \]\]/g;

const { index, items } = [...string.matchAll(regex)].reduce(
    (result, regExpMatchArray) => {
      const [match] = regExpMatchArray;
      const { index: currentIndex } = regExpMatchArray;

      if (currentIndex === undefined) {
        return result;
      }

      return {
        items: [
          ...result.items,
          string.substring(result.index, currentIndex),
          match,
        ],
        index: currentIndex   match.length,
      };
    },
    {
      index: 0,
      items: [],
    }
  );

if (index !== string.length) {
  items.push(string.substring(index, string.length));
}

console.log(items);

  • Related