MongoDB remove duplicates from a nested array-CodePudding

I have a users collection with the following structure:

[
  {
    name: "xxx",
    labels: [
      {
        category: "Language",
        values: ["English", "Spanish"],
      },
      {
        category: "Hobby",
        values: ["Read", "Cook", "Read"],
      },
    ]
  },
  {
    name: "yyy",
    labels: [
      {
        category: "Language",
        values: ["English", "English"],
      },
      {
        category: "Hobby",
        values: ["Read", "Play", "Play"],
      },
    ]
  },
]

I want to delete all duplicates from values array, so the result would be:

[
  {
    name: "xxx",
    labels: [
      {
        category: "Language",
        values: ["English", "Spanish"],
      },
      {
        category: "Hobby",
        values: ["Read", "Cook"],
      },
    ]
  },
  {
    name: "yyy",
    labels: [
      {
        category: "Language",
        values: ["English"],
      },
      {
        category: "Hobby",
        values: ["Read", "Play"],
      },
    ]
  },
]

I tried to use setUnion and setIntersection, but I didn't know what is the right why to use them with a nested array.

CodePudding user response：

For Mongo version 4.2 you can use pipeline updates for this, like so:

db.collection.updateMany(
{},
[
  {
    "$set": {
      labels: {
        $map: {
          input: "$labels",
          in: {
            $mergeObjects: [
              "$$this",
              {
                values: {
                  $setUnion: "$$this.values"
                }
              }
            ]
          }
        }
      }
    }
  }
])

Mongo Playground

For older Mongo versions you'll have to read each document into memory and do this in code, then update each document separately.