Home > database >  Merging multiple objects into a single object in the mongodb aggregation
Merging multiple objects into a single object in the mongodb aggregation

Time:11-04

So this is the scenario I'm in. I have below kind of an array which returns by an aggregation stage. for simplicity, removed all unnecessary props. The array is already sorted by date, which means the clocked_in_at property will be in the order of when each object is created.

[
        {
            "_id": "618192d4654484639c47fa2d",
            "clocked_out_at": "2021-11-05T10:00:00.000Z",
            "clocked_in_at": "2021-11-05T03:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90" // this is here as a string
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-05T11:00:00.000Z",
            "clocked_in_at": "2021-11-05T04:00:00.000Z",
            "visitor_id": "6182e4cea8b52121d01dff1b"
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-05T12:00:00.000Z",
            "clocked_in_at": "2021-11-05T05:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90"
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-06T13:00:00.000Z",
            "clocked_in_at": "2021-11-06T06:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90"
        }
]

So you can see the first, third, and last objects are from the same visitor whilst the second one is from another visitor. so what I basically need is that to merge all the objects in the array based on the visitor_id and the date/time and set the clocked_out_at value from the last existing value in the array for that same day if that makes sense. Basically, we need to group them separately based on the clocked_in_at values. the same visitor should still be in two objects if they're in different clocked_in_at dates.

So the expected output is something like:

[
        {
            "_id": "618192d4654484639c47fa2d",
            "clocked_out_at": "2021-11-05T12:00:00.000Z",
            "clocked_in_at": "2021-11-05T03:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90"
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-05T11:00:00.000Z",
            "clocked_in_at": "2021-11-05T04:00:00.000Z",
            "visitor_id": "6182e4cea8b52121d01dff1b"
        },
        {
            "_id": "6182552fde30e84900ba33fd",
            "clocked_out_at": "2021-11-06T13:00:00.000Z",
            "clocked_in_at": "2021-11-06T06:00:00.000Z",
            "visitor_id": "6166c10965959d147c69aa90"
        },
]

So, here you can see the first and 3rd objects in the original array were merged. because they had the same date(disregarding the time) for clocked_in_at and the visitor_id. Even though the last object is from the same visitor_id, it did not merge with because it's on 6th Nov., And the second object is pretty obvious, it wasn't merged because it has a totally different visitor_id.

Notice that the clocked_out_at value of the 3rd object in the original array got merged into the resultant merged object which is the first object's clocked_out_at in the resultant array.

I'm not exactly sure about the possibility of doing this, but I'd love to know whether we have any solution. I hope something like $mergeObjects or $group. I tried with them but got no luck.

Appreciate your time, thanks!

CodePudding user response:

use $group

db.collection.aggregate([
  {
    "$group": {
      "_id": {
        "clocked_in_at": {
          $dateTrunc: {
            date: {"$toDate": "$clocked_in_at" },
            unit: "day"
          }
        },
        "visitor_id": "$visitor_id"
      },
      "max": { "$max": "$clocked_out_at" },
      "min": { "$min": "$clocked_in_at"},
      "id": { "$first": "$id" }
    }
  },
  {
    "$project": {
      _id: "$id",
      "visitor_id": "$_id.visitor_id",
      "clocked_out_at": "$max",
      "clocked_in_at": "$min"
    }
  }
])

mongoplayground

CodePudding user response:

If clocked_in_at and clocked_out_at are strings we can use $toDate to first convert thes to dates (which will make ordering easier). If they are already dates we can skip this step.

Then we can $project each clocked in and clocked out times into an array of objects which contain the day and the value. $dateTrunc is used to convert clocked_in_at and clocked_out_at to days, then $unwind the newly created datetime field. Now we can $group by the day ("$datetime.day") and visitor_id` keeping the $min in time and $max out time per day. We can $project again to clean up object structure:

db.collection.aggregate([
  // (Assuming strings not dates) Convert to DateTime
  {
    "$addFields": {
      "clocked_in_at": {
        "$toDate": "$clocked_in_at"
      },
      "clocked_out_at": {
        "$toDate": "$clocked_out_at"
      }
    }
  },
  {
    "$project": {
      "s_id": "$s_id",
      "visitor_id": "$visitor_id",
      "datetime": [
        {
          "day": {
            "$dateTrunc": {
              "date": "$clocked_in_at",
              "unit": "day"
            }
          },
          "in": "$clocked_in_at"
        },
        {
          "day": {
            "$dateTrunc": {
              "date": "$clocked_out_at",
              "unit": "day"
            }
          },
          "out": "$clocked_out_at"
        }
      ]
    }
  },
  {
    "$unwind": "$datetime"
  },
  {
    "$group": {
      "_id": {
        "visitor_id": "$visitor_id",
        "day": "$datetime.day"
      },
      "s_id": {
        "$first": "$s_id"
      },
      "clocked_in_at": {
        "$min": "$datetime.in"
      },
      "clocked_out_at": {
        "$max": "$datetime.out"
      }
    }
  },
  {
    "$project": {
      "_id": "$s_id",
      "clocked_out_at": "$clocked_out_at",
      "clocked_in_at": "$clocked_in_at",
      "visitor_id": "$_id.visitor_id"
    }
  }
])
[
  {
    "_id": "6182552fde30e84900ba33fd",
    "clocked_in_at": ISODate("2021-11-06T06:00:00Z"),
    "clocked_out_at": ISODate("2021-11-06T13:00:00Z"),
    "visitor_id": "6166c10965959d147c69aa90"
  },
  {
    "_id": "618192d4654484639c47fa2d",
    "clocked_in_at": ISODate("2021-11-05T03:00:00Z"),
    "clocked_out_at": ISODate("2021-11-05T12:00:00Z"),
    "visitor_id": "6166c10965959d147c69aa90"
  },
  {
    "_id": "6182552fde30e84900ba33fd",
    "clocked_in_at": ISODate("2021-11-05T04:00:00Z"),
    "clocked_out_at": ISODate("2021-11-05T11:00:00Z"),
    "visitor_id": "6182e4cea8b52121d01dff1b"
  }
]

mongoplayground

Note _id was not unique in the provided sample so the field was modified to s_id:

[
  {
    "s_id": "618192d4654484639c47fa2d",
    "clocked_out_at": "2021-11-05T10:00:00.000Z",
    "clocked_in_at": "2021-11-05T03:00:00.000Z",
    "visitor_id": "6166c10965959d147c69aa90"
  },
  {
    "s_id": "6182552fde30e84900ba33fd",
    "clocked_out_at": "2021-11-05T11:00:00.000Z",
    "clocked_in_at": "2021-11-05T04:00:00.000Z",
    "visitor_id": "6182e4cea8b52121d01dff1b"
  },
  {
    "s_id": "6182552fde30e84900ba33fd",
    "clocked_out_at": "2021-11-05T12:00:00.000Z",
    "clocked_in_at": "2021-11-05T05:00:00.000Z",
    "visitor_id": "6166c10965959d147c69aa90"
  },
  {
    "s_id": "6182552fde30e84900ba33fd",
    "clocked_out_at": "2021-11-06T13:00:00.000Z",
    "clocked_in_at": "2021-11-06T06:00:00.000Z",
    "visitor_id": "6166c10965959d147c69aa90"
  }
]

This means that the initial $project will need to be updated to:

  {
    "$project": {
      "s_id": "$_id", // <- grab `_id` instead of `s_id`
  • Related