Home > Back-end >  Optimal way to find union of two data sets
Optimal way to find union of two data sets

Time:07-18

I am working on a side project where I am comparing two different databases and want to find the common elements of the data sets based on the "id" field. I want to know if there is an optimal solution instead of using two nested for loops. Is there a way to do it with a hash map? Many Thanks! Below is the sample code I am working with.

// data set 1
const set1 = [
  {
    id: "001",
    name: "bob",
    age: "50",
    location: "texas"
  },
    {
    id: "002",
    name: "bill",
    age: "51",
    location: "texas"
  },
    {
    id: "003",
    name: "ben",
    age: "52",
    location: "texas"
  },
    {
    id: "004",
    name: "cam",
    age: "53",
    location: "texas"
  },
    {
    id: "005",
    name: "max",
    age: "54",
    location: "texas"
  }
]

// data set 2
const set2 = [
  {
    id: "001",
    name: "bob"
  },
  {
    id: "002",
    name: "bill"
  }
]

// I want to create a function where I find the the common elements of the two lists based on id and put the common element of data set 1 into a list and return that list

const findUnion(set1, set2) {
  // logic here, I know I can do a nested for loop but is there a more efficient way such as 
  // using a hashmap? ( Map() object? ) 
}

// desired output 
const output = [
  {
    id: "001",
    name: "bob",
    age: "50",
    location: "texas"
  },
    {
    id: "002",
    name: "bill",
    age: "51",
    location: "texas"
  }
]

CodePudding user response:

You can use Sets for efficient lookup:

const ids1 = new Set(set1.map(({id}) => id));
const ids2 = new Set(set2.map(({id}) => id));
const output = set1.filter(({id}) => ids1.has(id) && ids2.has(id));
console.log(output);

CodePudding user response:

First we combine into one long array. Then group by id using reduce method. Each group contains the item and count of appearances. Finally, for each of the groups, return only those with count of appearances > 1.

Edit: fixed algorithm see code.

function findUnion(set1, set2) {

  // first remove duplicates from each set
  // bonus: collect duplicates
  var duplicates;

  function dedup(set) {
    duplicates = []
    return Object.values(set.reduce(function(agg, item) {
      if (agg[item.id]) {
        duplicates.push(item)
      }
      agg[item.id] = item;
      return agg
    }, {}));
  }

  set1 = dedup(set1);
  set2 = dedup(set2);

  // then combine
  var combined = [...set1, ...set2]

  // then remove duplicates again, this time keep them
  dedup(combined)
  return duplicates;
}

// data set 1
const set1 = [{
    id: "001",
    name: "bob",
    age: "50",
    location: "texas"
  },
  {
    id: "002",
    name: "bill",
    age: "51",
    location: "texas"
  },
  {
    id: "003",
    name: "ben",
    age: "52",
    location: "texas"
  },
  {
    id: "004",
    name: "cam",
    age: "53",
    location: "texas"
  },
  {
    id: "005",
    name: "max",
    age: "54",
    location: "texas"
  },
  {
    id: "005",
    name: "max",
    age: "54",
    location: "texas"
  },
  {
    id: "005",
    name: "max",
    age: "54",
    location: "texas"
  }
]

// data set 2
const set2 = [{
    id: "001",
    name: "bob"
  },
  {
    id: "002",
    name: "bill"
  }
]


// desired output 
const output = [{
    id: "001",
    name: "bob",
    age: "50",
    location: "texas"
  },
  {
    id: "002",
    name: "bill",
    age: "51",
    location: "texas"
  }
]

console.log(findUnion(set1, set2))

  • Related