Home > Software design >  split text string by array of properties
split text string by array of properties

Time:03-01

I have a block of text which is supposed to have different types of formatting. The text and the formatting tags are stored in a string and an array respectively. I want to create a data structure that will hold the 2 together. Here is actual working data:

Formatted text:

this text is bold while this is italic and this is bold too.

Text string:

this text is bold while this is italic and this is bold too.

Format tags array:

{
  "bold":[
    [13,16],
    [51,54]
  ],
  "italic":[
    [32,37]
  ]
}

Note that the format tags array contains the start and end of different types of formatted text.

Now the question is: How could I merge these 2 types of information to create one good object holding the two. A generic format that can be converted into HTML and Markdown would be appreciated. I was thinking about:

[
  {text:"this text is ", tags: []}.
  {text: "bold", tags: ["bold"]},
  {text: "while this is ", tags: []},
  {text: "italic", tags: ["italic"]},
  {text: " and this is ", tags: []},
  {text: "bold", tags: ["bold"]},
]

Also note that it is possible for one text slice to have multiple tags.

CodePudding user response:

Possible solution:

/*
Groups range tags
@param {Object} tagsMap
@returns {Object} rangeTagsMap
*/
const _groupRangeTags = tagsMap => 
  Object.entries(tagsMap).reduce((map, [tag, ranges]) => {
    ranges.forEach(range => map[range] = [...(map[range] ?? []), tag]);   
    return map;
  }, {});
  
/*
Lists sorted range objects with tags
@param {Object} rangeTagsMap
@returns {Array} rangeTagsList
*/
const _listRangesWithTags = rangeTagsMap =>
  Object.entries(rangeTagsMap)
    .map(([range, tags]) => {
      const [start, end] = range.split(',');
      return { start:  start, end:  end, tags };
    })
    .sort(({ start: a }, { start: b }) => a - b);

/*
Returns range objects with tags including the ones without tags
@param {Array} rangeTagsList
@param {String} str
@returns {Array} strRangeTagsList
*/
const _fillRangesWithoutTags = (rangeTagsList, str) => {
  const strRangeTagsList = [];
  if(rangeTagsList.length === 0) {
    strRangeTagsList.push({ start: 0, end: str.length, tags: [] });
  } 
  for (i = 0; i < rangeTagsList.length; i  ) {
    const current = rangeTagsList[i], next = rangeTagsList[i 1];
    strRangeTagsList.push(current);
    if(i === 0 && current.start !== 0) {
      strRangeTagsList.unshift({ start: 0, end: current.start-1, tags: [] });
    }
    if (next && current.end != next.start) {
      strRangeTagsList.push({ start: current.end 1, end: next.start-1, tags: [] });
    }
    if(i === rangeTagsList.length-1 && current.end !== str.length-1) {
      strRangeTagsList.push({ start: current.end 1, end: str.length, tags: [] });
    }
  }
  return strRangeTagsList;
}

/*
Returns string range objects with text and tags
@param {Array} strRangeTagsList
@param {String} str
@returns {Array} strRanges
*/
const _getTextRanges = (strRangeTagsList, str) => strRangeTagsList.map(({ start, end, tags }) => ({ 
  text: str.substring(start, end 1), tags 
}));

/*
@param {String} str
@param {Object} tagsMap
@returns {Array} strRanges
*/
const _getRanges = (str, tagsMap = {}) => {
  const rangeTagsMap = _groupRangeTags(tagsMap);
  const rangeTagsList = _listRangesWithTags(rangeTagsMap);
  const strRangeTagsList = _fillRangesWithoutTags(rangeTagsList, str);
  return _getTextRanges(strRangeTagsList, str);
}
  
console.log( _getRanges('this text is bold while this is italic and this is bold too.', { bold: [ [13,16], [51,54] ], italic: [ [32,37] ] }) );
console.log( _getRanges('this is bold while this is both bold and italic.', { bold: [[8,11],[32,46]], italic: [[32,46]] }) );

  • Related