Home > front end >  Find total number of occurrences in array of arrays at a particular cumulative distance
Find total number of occurrences in array of arrays at a particular cumulative distance

Time:09-04

I am building a way to do proximity search in a positional inverted index - Implementing proximity search in positional inverted index nodejs. This is a sub-problem in that.

I have an array of arrays containing positions of different words on a page.

{
    pageno: [
        [positions of word 1],
        [positions of word 2],
        [positions of word n]
    ]
}

For eg -

{
    1 : [
        [1, 5, 6],
        [2, 41],
        [4, 7, 11]
    ],
    2 : [
        [1, 5, 6],
        [2, 41],
        [3, 7, 11]
    ]
}

I want to find, for each pageNo, the number of occurrences such that the sum of the differences between positions of words do not exceed a specified value (proximity).

If the value of proximity is 1, all the words shouldn't have more than 1 word between them. So a "Hello world nodejs" should match "Hello world in nodejs" as there is only one word in between - 'in'. But, it won't match 'hello from world in nodejs' as there are total 2 words in between- 'from' and 'in'.

Note, that jumbled words are allowed.

How to do this in JavaScript? - I was trying to do something like Finding matches between multiple JavaScript Arrays but couldn't make the necessary changes to make it work here.

The expected output for the above case would be (proximity: 2):

{
    1 : 3,
    2 : 3
}

Page 1: (1,2,4)-Proximity (2-1 -1) (4-2 -1)=1, (5,2,4)- Proximity (5-4 -1) (4-2 -1)=1 and (6,2,4)

Page 2: (1,2,3), (5,2,3), (6,2,3)

CodePudding user response:

I use "cartesian product" function from another answer. Then sort each of the result counting the maximal difference between items. Check that against proximity and increase count if legit.

Update: Corrected distance formula according to comments. Also sort here requires explicit conversion to numbers. (Why? let me know in the comments below)

var input = {
  1: [
    [1, 5, 6],
    [2, 41],
    [4, 7, 11]
  ],
  2: [
    [1, 5, 6],
    [2, 41],
    [3, 7, 11]
  ]
}

function cartesianProduct(arr) {
  return arr.reduce(function(a, b) {
    return a.map(function(x) {
      return b.map(function(y) {
        return x.concat([y]);
      })
    }).reduce(function(a, b) {
      return a.concat(b)
    }, [])
  }, [
    []
  ])
}

function distance(arr) {
  arr.sort(function(a,b) {
    return  a -  b;
  });
  var total = 0;
  for (var i = 1; i < arr.length; i  ) {
    total  = (arr[i] - arr[i - 1] - 1)
  }
  return total;
}

function count_index(arrs, proximity) {
  var product = cartesianProduct(arrs);
  var count = 0;
  product.forEach(function(series) {
    if (distance(series) <= proximity) {
      console.log("found: "   series)
      count  
    }
  })
  return count
}

var result = {}
var proximity = 2;
Object.entries(input).forEach(function([key, value]) {
  result[key] = count_index(value, proximity)
})
console.log(result)

  • Related