So this is the scenario I'm in. I have below kind of an array which returns by an aggregation stage. for simplicity, removed all unnecessary props. The array is already sorted by date, which means the clocked_in_at property will be in the order of when each object is created.
[
{
"_id": "618192d4654484639c47fa2d",
"clocked_out_at": "2021-11-05T10:00:00.000Z",
"clocked_in_at": "2021-11-05T03:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90" // this is here as a string
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T11:00:00.000Z",
"clocked_in_at": "2021-11-05T04:00:00.000Z",
"visitor_id": "6182e4cea8b52121d01dff1b"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T12:00:00.000Z",
"clocked_in_at": "2021-11-05T05:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-06T13:00:00.000Z",
"clocked_in_at": "2021-11-06T06:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
}
]
So you can see the first, third, and last objects are from the same visitor whilst the second one is from another visitor. so what I basically need is that to merge all the objects in the array based on the visitor_id and the date/time and set the clocked_out_at value from the last existing value in the array for that same day if that makes sense. Basically, we need to group them separately based on the clocked_in_at values. the same visitor should still be in two objects if they're in different clocked_in_at dates.
So the expected output is something like:
[
{
"_id": "618192d4654484639c47fa2d",
"clocked_out_at": "2021-11-05T12:00:00.000Z",
"clocked_in_at": "2021-11-05T03:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T11:00:00.000Z",
"clocked_in_at": "2021-11-05T04:00:00.000Z",
"visitor_id": "6182e4cea8b52121d01dff1b"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-06T13:00:00.000Z",
"clocked_in_at": "2021-11-06T06:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
]
So, here you can see the first and 3rd objects in the original array were merged. because they had the same date(disregarding the time) for clocked_in_at and the visitor_id. Even though the last object is from the same visitor_id, it did not merge with because it's on 6th Nov., And the second object is pretty obvious, it wasn't merged because it has a totally different visitor_id.
Notice that the clocked_out_at value of the 3rd object in the original array got merged into the resultant merged object which is the first object's clocked_out_at in the resultant array.
I'm not exactly sure about the possibility of doing this, but I'd love to know whether we have any solution. I hope something like $mergeObjects or $group. I tried with them but got no luck.
Appreciate your time, thanks!
CodePudding user response:
use $group
db.collection.aggregate([
{
"$group": {
"_id": {
"clocked_in_at": {
$dateTrunc: {
date: {"$toDate": "$clocked_in_at" },
unit: "day"
}
},
"visitor_id": "$visitor_id"
},
"max": { "$max": "$clocked_out_at" },
"min": { "$min": "$clocked_in_at"},
"id": { "$first": "$id" }
}
},
{
"$project": {
_id: "$id",
"visitor_id": "$_id.visitor_id",
"clocked_out_at": "$max",
"clocked_in_at": "$min"
}
}
])
CodePudding user response:
If clocked_in_at
and clocked_out_at
are strings we can use $toDate to first convert thes to dates (which will make ordering easier). If they are already dates we can skip this step.
Then we can $project each clocked in and clocked out times into an array of objects which contain the day and the value. $dateTrunc is used to convert clocked_in_at
and clocked_out_at
to days, then $unwind the newly created datetime
field. Now we can $group by the day ("$datetime.day") and
visitor_id` keeping the $min in time and $max out time per day. We can $project again to clean up object structure:
db.collection.aggregate([
// (Assuming strings not dates) Convert to DateTime
{
"$addFields": {
"clocked_in_at": {
"$toDate": "$clocked_in_at"
},
"clocked_out_at": {
"$toDate": "$clocked_out_at"
}
}
},
{
"$project": {
"s_id": "$s_id",
"visitor_id": "$visitor_id",
"datetime": [
{
"day": {
"$dateTrunc": {
"date": "$clocked_in_at",
"unit": "day"
}
},
"in": "$clocked_in_at"
},
{
"day": {
"$dateTrunc": {
"date": "$clocked_out_at",
"unit": "day"
}
},
"out": "$clocked_out_at"
}
]
}
},
{
"$unwind": "$datetime"
},
{
"$group": {
"_id": {
"visitor_id": "$visitor_id",
"day": "$datetime.day"
},
"s_id": {
"$first": "$s_id"
},
"clocked_in_at": {
"$min": "$datetime.in"
},
"clocked_out_at": {
"$max": "$datetime.out"
}
}
},
{
"$project": {
"_id": "$s_id",
"clocked_out_at": "$clocked_out_at",
"clocked_in_at": "$clocked_in_at",
"visitor_id": "$_id.visitor_id"
}
}
])
[
{
"_id": "6182552fde30e84900ba33fd",
"clocked_in_at": ISODate("2021-11-06T06:00:00Z"),
"clocked_out_at": ISODate("2021-11-06T13:00:00Z"),
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "618192d4654484639c47fa2d",
"clocked_in_at": ISODate("2021-11-05T03:00:00Z"),
"clocked_out_at": ISODate("2021-11-05T12:00:00Z"),
"visitor_id": "6166c10965959d147c69aa90"
},
{
"_id": "6182552fde30e84900ba33fd",
"clocked_in_at": ISODate("2021-11-05T04:00:00Z"),
"clocked_out_at": ISODate("2021-11-05T11:00:00Z"),
"visitor_id": "6182e4cea8b52121d01dff1b"
}
]
Note _id
was not unique in the provided sample so the field was modified to s_id
:
[
{
"s_id": "618192d4654484639c47fa2d",
"clocked_out_at": "2021-11-05T10:00:00.000Z",
"clocked_in_at": "2021-11-05T03:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"s_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T11:00:00.000Z",
"clocked_in_at": "2021-11-05T04:00:00.000Z",
"visitor_id": "6182e4cea8b52121d01dff1b"
},
{
"s_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-05T12:00:00.000Z",
"clocked_in_at": "2021-11-05T05:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
},
{
"s_id": "6182552fde30e84900ba33fd",
"clocked_out_at": "2021-11-06T13:00:00.000Z",
"clocked_in_at": "2021-11-06T06:00:00.000Z",
"visitor_id": "6166c10965959d147c69aa90"
}
]
This means that the initial $project will need to be updated to:
{
"$project": {
"s_id": "$_id", // <- grab `_id` instead of `s_id`