Home > Software design >  Typescript - Detect unique sentence in an array of words
Typescript - Detect unique sentence in an array of words

Time:07-15

I'm trying to detect unique sentences (a string) in an array of words in typescript but i'm not sure if i'm going the right way.

My first approach is to convert the array of words in a big string and then use indexOf to return the indices of the matches ex :

const words: Word[] = [
  { id: 1, content: "date" },
  { id: 2, content: "of" },
  { id: 3, content: "my" },
  { id: 4, content: "birthday" },
  { id: 5, content: "date" },
  { id: 6, content: "of" },
  { id: 7, content: "his" },
  { id: 8, content: "birthday" },
];

const uniqueSentence = "date of his"

const wordsString: string = words.map((w) => w.content).join(" ");

function findText(searchStr: string, str: string) {
  var searchStrLen = searchStr.length;  
  if (searchStrLen == 0) {
    return [];
  }
  var startIndex = 0,
    index,
    indices = [];
  str = str.toLowerCase();
  searchStr = searchStr.toLowerCase();
  while ((index = str.indexOf(searchStr, startIndex)) > -1) {
    indices.push(index);
    console.log(str.substring(index));
    startIndex = index   searchStrLen;
  }
  return indices;
}

const indices = findText(text, wordsString)

// Here the indices will be equal to [20]

In the case of the exemple above i would like the function findText to return the matching words but i have no clue on how i could acheive it

Test cases :

findText("date") => Error ("Non unique anchor")
findText("date of") => Error ("Non unique anchor")
findText("date of his") => [
  { id: 5, content: "date" },
  { id: 6, content: "of" },
  { id: 7, content: "his" },
]

CodePudding user response:

function findText(searchStr: string, str: string) {
    // get both in array form
    const splitString = str.split(" ");
    const splitSearch = searchStr.split(" ");

    let idxs: number[] = [];
    splitString.forEach((string, idx) => {
        splitSearch.forEach((search) => {
            // if string matches anyof search, get the following indices
            // as a new array, join both arrays and compare for equality
            if (string === search) {
                const possibleMatch = splitString.slice(
                    idx,
                    idx   splitSearch.length,
                );
                // if equal, push idx of first word
                splitSearch.join(" ") === possibleMatch.join(" ") &&
                    idxs.push(idx);
            }
        });
    });

    return idxs;
}

CodePudding user response:

Note: This answer is in response to the original question, which has since been edited in a substantial way that invalidates the answer.

If I understand the question correctly, you can do it by keeping track of the current index of each word as you iterate through them. There are some assumptions in your question (like inserting spaces between each word), and this behavior needs to be accounted for during the iteration (e.g. prepending each word with the space). Here's how you could write a function to find the index of the first word which matches the search text exactly:

TS Playground

type Word = {
  id: number;
  content: string;
};

const words: Word[] = [
  { id: 1, content: 'date' },
  { id: 2, content: 'of' },
  { id: 3, content: 'my' },
  { id: 4, content: 'birthday' },
  { id: 5, content: 'date' },
  { id: 6, content: 'of' },
  { id: 7, content: 'his' },
  { id: 8, content: 'birthday' },
];

function findIndex (
  words: Word[],
  searchText: string,
  joinString = ' ',
): number {
  if (!searchText) return -1;

  const indexes: number[] = [];
  let index = 0;
  let wordString = '';

  const iter = words[Symbol.iterator]();
  const {done, value: word} = iter.next();
  if (done) return -1;
  indexes.push(index);
  wordString  = word.content;
  index  = word.content.length;

  for (const word of iter) {
    indexes.push(index   joinString.length);
    wordString  = joinString;
    wordString  = word.content;
    index  = joinString.length   word.content.length;
  }

  return indexes.indexOf(wordString.indexOf(searchText));
}

console.log(findIndex(words, '')); // -1 (not found)
console.log(findIndex(words, 'my date')); // -1
console.log(findIndex(words, 'date of his')); // 4
console.log(findIndex(words, 'of my')); // 1
console.log(findIndex(words, 'birthday')); // 3

Compiled JS from the TS Playground:

"use strict";
const words = [
    { id: 1, content: 'date' },
    { id: 2, content: 'of' },
    { id: 3, content: 'my' },
    { id: 4, content: 'birthday' },
    { id: 5, content: 'date' },
    { id: 6, content: 'of' },
    { id: 7, content: 'his' },
    { id: 8, content: 'birthday' },
];
function findIndex(words, searchText, joinString = ' ') {
    if (!searchText)
        return -1;
    const indexes = [];
    let index = 0;
    let wordString = '';
    const iter = words[Symbol.iterator]();
    const { done, value: word } = iter.next();
    if (done)
        return -1;
    indexes.push(index);
    wordString  = word.content;
    index  = word.content.length;
    for (const word of iter) {
        indexes.push(index   joinString.length);
        wordString  = joinString;
        wordString  = word.content;
        index  = joinString.length   word.content.length;
    }
    return indexes.indexOf(wordString.indexOf(searchText));
}
console.log(findIndex(words, '')); // -1 (not found)
console.log(findIndex(words, 'my date')); // -1
console.log(findIndex(words, 'date of his')); // 4
console.log(findIndex(words, 'of my')); // 1
console.log(findIndex(words, 'birthday')); // 3

CodePudding user response:

I think you should loop over the array instead of joining it, that way you allready have the index / object and can simply return that:

const words = [
  { id: 1, content: "date" },
  { id: 2, content: "of" },
  { id: 3, content: "my" },
  { id: 4, content: "birthday" },
  { id: 5, content: "date" },
  { id: 6, content: "of" },
  { id: 7, content: "his" },
  { id: 8, content: "birthday" },
];

const uniqueSentence = "date of"

const uniq = uniqueSentence.split(' ') 
const senteces = words
      //instead of map/filter you can do foreach   define array and push into that array
      .map((a,i) => { 
         if( 
            // the first word matches the current position of the array
            a.content === uniq[0] 
            // Check that if you take the next words exactly matches the entire sentence
            && words.slice(i, i   uniq.length).map(a => a.content).join(' ') === uniqueSentence) 
            // Return all the words in the sentence
          { return  words.slice(i, i   uniq.length) } 
          //no match, so return false for the filter function
          else return false })
       //remove misses
      .filter(a => a);

// The result will be an array of array, where each array will represent one found instance of the sentence with all words
console.log(senteces)

Edit: changed the code slighly to represent the changed question

CodePudding user response:

This fairly simple version finds all matches in the original list. We wrap it in a function which calls it and responds with the one found array, your "Non unique anchor" text of "Not found", depending on how many instances it found. It's written in plain JS. I leave the TS to you. I also simply return a string in the exception cases. If you want to return an error, throw an exception, or something else, it should be pretty easy to do.

const _findText = (words, s, ws = s .split (/\s /)) => 
  [... words .keys ()] 
    .filter (i => ws .every ((w, j) => words [i   j] .content == w))
    .map (i => ws .map ((_, j) => words [i   j]))

const findText = (words, s, groups = _findText (words, s)) =>
  groups .length > 1
    ? 'Non unique anchor'
  : groups .length == 1
    ? groups [0]
  : 'Not found'

const words = [{id: 1, content: "date"},  {id: 2, content: "of"}, {id: 3, content: "my"}, {id: 4, content: "birthday"}, {id: 5, content: "date"}, {id: 6, content: "of"}, {id: 7, content: "his"}, {id: 8, content: "birthday"}];

['date of', 'date of his', 'our birthday'] .forEach (s => console .log (s, ':', findText (words, s)))
.as-console-wrapper {max-height: 100% !important; top: 0}

Note that the important work is all done in the internal _findText function. The main function, findText, simply supplies the public wrapper to handle your non-unique response. If you haven't seen it before, Array.prototype.keys yields an iterator for the numeric indices, so when we wrap it in [... words .keys ()] we get back [0, 1, 2, 3, 4, 5, 6, 7], the indices of the elements in words.

The running time is O (m * n), where m is the length of the word-list and n the number of words in the search string. There are some sophisticated techniques that could speed this up a bit (akin to how the regex /aab/ doesn't need to start over at the beginning when it encounters three consecutive as.) But those would be a lot more work and would only help in the case you have a huge word-list with a vast number of repeated words.

  • Related