How to get customed tags in a text, and put in another text?-CodePudding

The header question may not be easy to understand. Hope you can understand my detailed info below.

I have sentence data below, that has some tags, represented by [tn]tag[/tn]:

const sentence = `[t1]Sometimes[/t1] that's [t2]just the way[/t2] it has to be. Sure, there
 were [t3]probably[/t3] other options, but he didn't let them [t4]enter his mind[/t4]. It 
was done and that was that. It was just the way [t5]it[/t5] had to be.`

And i have parts of the sentence.

const parts = [
    "Sometimes that's just the way",
    "it has to be",
    "Sure,",
    "there were probably other options,",
    "but he didn't let them enter his mind.",
    "It was done and that was that.",
    "It was just the way it had to be."
];

Goal is to add tags on each parts using the sentence above.

const expectedOutput = [
    "[t1]Sometimes[/t1] that's [t2]just the way[/t2]",
    "it has to be",
    "Sure,",
    "there were [t3]probably[/t3] other options,",
    "but he didn't let them [t4]enter his mind[/t4].",
    "It was done and that was that.",
    "It was just the way [t5]it[/t5] had to be."
];

What I've tried so far are the following, but seemingly does not make sense, and I endup nothing.

make a clone sentence, and remove all tags. (code below)
find all parts in the sentence.
[problem is I don't know how to put again the tags]

I wanna ask is there any chance to achieve it? and how. thanks

export const removeTags = (content) => {
  content = content.replace(/([t]|[\/t])/g, '');
  return content.replace(/([t\d ]|[\/t\d ])/g, '');
};

CodePudding user response：

For a regex answer: /\[t\d \]([^[]*)\[\/t\d \]/g will match all words including tags and then group all the words within those tags.

let regex = /\[t\d \]([^[]*)\[\/t\d \]/g;
let matches = [], tags = [];
var match = regex.exec(sentence);
while (match != null) {
    tags.push(match[0]);
    matches.push(match[1]);
    match = regex.exec(sentence);
}

now we just need to replace all matches with tags inside of parts

let lastSeen = 0;
for (let i = 0; i < parts.length; i  ) {
    for (let j = lastSeen; j < matches.length; j  ) {
        if (parts[i].includes(matches[j])) {
            lastSeen  ;
            parts[i] = parts[i].replaceAll(matches[j], tags[j])
        } else if (j > lastSeen) {
            break;
        }
    }
}

Here is a link to see the regex: regex101

And here is a JSFiddle to see the whole thing JSFiddle

CodePudding user response：

Here I also made an alternative version, so just gonna dump it below. No nesting as in @thchp but a bit more easy to read imo.

const sentence = "[t1]Sometimes[/t1] that's [t2]just the way[/t2] it has to be. Sure, there"  
 "were [t3]probably[/t3] other options, but he didn't let them [t4]enter his mind[/t4]. It "  
 "was done and that was that. It was just the way [t5]it[/t5] had to be.";

const parts = [
    "Sometimes that's just the way",
    "it has to be",
    "Sure,",
    "there were probably other options,",
    "but he didn't let them enter his mind.",
    "It was done and that was that.",
    "It was just the way it had to be."
];

const getTokens = (text) => {
  const tokens = text.match(/\[t[0-9] \]/gm);
  const result = [];
  
  tokens.forEach(tokenOpen => {
    const tokenClose = "[/"   tokenOpen.substring(1, tokenOpen.length);
    const tokenStart = text.indexOf(tokenOpen)   tokenOpen.length;
    const tokenEnd = text.indexOf(tokenClose);
    result.push({
        tokenOpen,
      tokenClose,
        value: text.substr(tokenStart, tokenEnd - tokenStart)
    });
  });
  
  return result;
}

const applyTokens = (parts, tokens) => {
    return parts.map(part => {
        const match = tokens.filter(x => part.includes(x.value));

        if(!match.length)
            return part;
    
        const {value, tokenOpen, tokenClose} = match[0];
        const index = part.indexOf(value);
        const partPre =  part.substr(0, index);
        const partPost = part.substr(index   value.length, part.length);
        return partPre   tokenOpen   part.substr(index, value.length)   tokenClose   partPost;
    });
}

const output = applyTokens(parts, getTokens(sentence));

console.log(output);

It appends tags to all occurrences of some value in a part so the first "it" in second element of "parts" array gets wrapped as well. If you don't want that then remove once used token in "applyTokens".

CodePudding user response：

Here is a solution that assumes that there are no nested tags, that all tags open and close in the part. Also, this assumes that all characters from the sentence are in parts. For this last assumption, I had to add the . after it has to be in the second expected part. I also had to remove newline characters from the sentence but I think it was because of the copy/paste. This solution will loop through all characters and store two parallel buffers : one with the tags, one without. We will use the second one to compare with the parts, and use the first one to generate the output.

const sentence = `[t1]Sometimes[/t1] that's [t2]just the way[/t2] it has to be. Sure, there were [t3]probably[/t3] other options, but he didn't let them [t4]enter his mind[/t4]. It was done and that was that. It was just the way [t5]it[/t5] had to be.`


const parts = [
  "Sometimes that's just the way",
  "it has to be.",
  "Sure,",
  "there were probably other options,",
  "but he didn't let them enter his mind.",
  "It was done and that was that.",
  "It was just the way it had to be."
];

let bufferWithoutTags = ""
let bufferWithTags = ""
const output = []
const buffers = []
let tagOpened = false

for (let i = 0; i < sentence.length;   i) {
  let c = sentence[i]
  bufferWithTags  = c
  if ( c === '[') {
    if (tagOpened && sentence[i 1] === "/") {
      tagOpened = false
    } else {
      tagOpened = true
    }
    while (c != ']') {
      c = sentence[  i]
      bufferWithTags  = c
    }
  } else {
    bufferWithoutTags  = c;
  }
  if (!tagOpened) {
    for (const part of parts) {
      if (part === bufferWithoutTags.trim()) {
        output.push(bufferWithTags.trim())
        bufferWithTags = bufferWithoutTags = ""
      }
    }
  }
}
console.log(output)