Home > OS >  Longest prefix that match more than 50% of items in an array
Longest prefix that match more than 50% of items in an array

Time:10-17

Suppose I have array of strings: ["apple", "ape", "application", "item"]. I need to find the longest common prefix that matches more than 50% of the strings in that array.

For example, we got "ap" being the prefix of 3 strings in a 4 elements array -> so more than 50% of the array -> returns the prefix. This is my attempt:

const longestPrefix = (arrStr) => {
  if (arrStr.length === 0) return "";

  let prefix = "";
  let noPrefixMatched = {};
  // loop the characters of first word
  for (let i = 0; i < arrStr[0].length; i  ) {
    let char = arrStr[0][i];
    let j = 0;
    // loop all the words of the array except first one
    for (j = 1; j < arrStr.length; j  ) {
      if (arrStr[j][i] !== char) {
        // if first char not matched then set that word as notMatched
        if (i === 0) {
          noPrefixMatched[arrStr[j]] = true;
        }
        break;
      }
      // if the first characters are equal in all words and the loop reach the final word
      if (arrStr[j][i] === char && j === arrStr.length - 1) {
        prefix  = char;
      }
    }
  }

  return prefix;
};

I try to get the most common prefix by vertical comparison, but it's not working with a word without any prefix in the array (like "item" in the above array). How should I do this?

CodePudding user response:

One way to do this is to iterate all the words, constructing prefixes one letter at a time and counting the occurrence of each prefix as you see it. You can then filter that result based on the count being greater than 1/2 the length of the input array, and finally sort it by the prefix length descending, returning the first entry in the result:

const words = ["apple", "ape", "application", "item"]

const counts = words.reduce((acc, w) => {
  prefix = ''
  for (i = 0; i < w.length; i  ) {
    prefix  = w[i]
    acc[prefix] = (acc[prefix] || 0)   1
  }
  return acc
}, {})

const bestPrefix = Object.entries(counts)
  .filter(([_, v]) => v > words.length / 2.0)
  .sort((a, b) => b[0].length - a[0].length)
  [0][0]
  
console.log(bestPrefix)

CodePudding user response:

The first word should not be considered special or be hardcoded in any way - it may not even be a match for the substring selected, after all.

A simple way to code this would be to iterate over all words and their possible prefixes and see which prefixes pass the test, while keeping the best one in an outer variable - then return it at the end.

const getPrefixes = str => Array.from(str, (_, i) => str.slice(0, i   1));
const matches = (arr, prefix) => {
  const matchCount = arr.reduce((a, str) => a   str.startsWith(prefix), 0);
  return matchCount / arr.length > 0.5;
};
const longestPrefix = (arrStr) => {
  let bestPrefix = '';
  for (const str of arrStr) {
    for (const prefix of getPrefixes(str)) {
      if (prefix.length > bestPrefix.length && matches(arrStr, prefix)) {
        bestPrefix = prefix;
      }
    }
  }
  return bestPrefix;
};
console.log(longestPrefix(["apple", "ape", "application", "item"]));

A less computationally complex but more complicated method would be to construct a tree structure of characters from each string in the input, and then iterate through the tree to identify which nodes have enough nested children, and then pick the longest such node. This has the advantage of only requiring iterating over each character of each string in the input once.

const getBestChild = (obj, totalRequired, prefix = '') => {
  if (obj.count < totalRequired) return;
  const thisResult = { count: obj.count, prefix };
  if (!obj.children) {
    return thisResult;
  }
  let bestChild = thisResult;
  for (const [nextChar, child] of Object.entries(obj.children)) {
    const result = getBestChild(child, totalRequired, prefix   nextChar);
    if (result && result.prefix.length > bestChild.prefix.length) {
      bestChild = result;
    }
  }
  return bestChild;
};
const longestPrefix = (arrStr) => {
  const root = {};
  for (const str of arrStr) {
    let obj = root;
    for (const char of str) {
      if (!obj.children) {
        obj.children = {};
      }
      const { children } = obj;
      if (!children[char]) {
        children[char] = { count: 0 };
      }
      children[char].count  ;
      obj = children[char];
    }
  }
  const { length } = arrStr;
  const totalRequired = length % 2 === 0 ? (1   length / 2) : Math.ceil(length / 2);
  return getBestChild(root, totalRequired).prefix;
};
console.log(longestPrefix(["apple", "ape", "application", "item"]));

  • Related