Home > Mobile >  Return duplicate objects in array of large data using JavaScript
Return duplicate objects in array of large data using JavaScript

Time:11-03

Using JavaScript: I have an array of objects that I'm trying to determine the duplicate entries of so that I can eventually pass those duplicate entries to a separate function to remove them from a database.

My sample array could be:

const myArray = [
    { 'id': 111, 'lorem': 'ipsum' },
    { 'id': 222, 'lorem': 'dorem' },
    { 'id': 111, 'lorem': 'polus' },
    { 'id': 111, 'lorem': 'waifu' },
]

I'd want to return an array of all items that would be duplicate by the key id. In this example, my returned array would be:

[
    { 'id': 111, 'lorem': 'ipsum' },
    { 'id': 111, 'lorem': 'polus' },
    { 'id': 111, 'lorem': 'waifu' },
]

Most of the online tutorials have me iterating over a short list of data, and is great for such small data examples. But my dataset is in the thousands, if not millions, as my data grows. So I'm trying to find a smarter way of handling this logic.

I understand that I can run a Set(), but that doesn't actually give me the duplicate entries - that gives me an array with non-duplicates. My need is to return such duplicates, not to have a new array of non-duplicate entries.

Without using a third party such as lodash or underscore, how would I ideally iterate over an array with unknown size, to eventually return the duplicate items for me to pass up the stream for processing?

CodePudding user response:

This may take a while

I recommend you do this on the server

// create an array with random IDs
const myArray = []
for (let i = 0; i < 10000; i  ) {
  myArray.push({
    id: String(Math.floor(Math.random() * 10000)).padStart(3, "0"),
    "lorem": "ipsum"
  })
}

// examine them
const ids = []
const dupes = []
myArray.forEach(({id}) => {
  if (ids.includes(id)) dupes.push(id);
  else ids.push(id)
})
console.log(ids.length, dupes, dupes.length)
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

  • Using Array#reduce, iterate over the array while updating a Map to group items by id
  • Using Map#values, get the list of grouped arrays
  • Using Array#filter, keep the arrays with more than one item
  • Using Array#flat, return all arrays in one list

const myArray = [ { 'id': 111, 'lorem': 'ipsum' }, { 'id': 222, 'lorem': 'dorem' }, { 'id': 111, 'lorem': 'polus' }, { 'id': 111, 'lorem': 'waifu' } ];

const duplicates = 
  [...myArray.reduce((map, item) => // group items by id
    map.set(item.id, [...(map.get(item.id) ?? []), item])
  , new Map)
  .values()] // get grouped arrays
  .filter(list => list.length > 1) // keep duplicates
  .flat(); // return one array

console.log(duplicates);
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

  • Related