I have a large set of records in this schema:
{
username: "[email protected]",
timestamp: 1646006400000 //unix timestamp in milliseconds
amount: 100
}
Given an arbitrary range of dates, I need to find the change in average amount value vs a similar prior range of dates.
For example, if the input is March, 10 to March, 15, I need to calculate the average in that period and the difference with the average in the March, 5 to March 10 period.
This is how I planned it:
- given the
from_date
&to_date
, calculate the interval and subtract to getprior_from_date
&prior_to_date
match
all the records fromprior_from_date
toto_date
- use setWindowFields to create a flag
prior
on the prior records group
byusername
and then group byprior
flag to calculate averages
I can compute step 1 on the app side (PHP). I'm able to get the average using $avg
but am unable to figure out how to do step 3. This is what I have so far:
{
partitionBy: '$username',
sortBy: {
timestamp: 1
},
output: {
prior: {
$sum: '$prior',
window: {
range: [
-86400000, // milliseconds in a day - may be multiply by interval from PHP?
0
]
}
}
}
}
This is the final desired output:
{
{
username:"[email protected]",
prior: {
average: 50
},
current: {
average: 74
}
},
{
username:"[email protected]",
prior: {
average: 73
},
current: {
average: 33
}
}
}
CodePudding user response:
Your requirements are not so clear to me, but could be this one:
db.collection.aggregate([
{ $set: { timestamp: { $toDate: "$timestamp" } } },
{
$setWindowFields: {
partitionBy: "$username",
sortBy: { timestamp: 1 },
output: {
average: {
$avg: "$amount",
window: { range: [-5, 5], unit: "day" }
}
}
}
}
])
CodePudding user response:
Since you are using a $match
step and you already have the timestamps calculated, you can just group using $addFields
with a condition to get the same effect:
db.collection.aggregate([
{
$match: {
timestamp: {
$gte: 1646438400000,
$lt: 1647302400000
}
}
},
{
"$addFields": {
"period": {
"$cond": [
{$gt: ["$timestamp", 1646870400000]},
"current",
"prior"
]
}
}
},
{
$group: {
_id: {period: "$period", username: "$username"},
average: {$avg: "$amount"}
}
},
{
$project: {
username: "$_id.username",
"data.k": "$_id.period",
"data.v": {average: "$average"},
_id: 0
}
},
{
$group: {_id: "$username",
data: {$push: "$data"}
}
},
{
$project: {
"data": {"$arrayToObject": "$data"}
}
},
{
$project: {
_id: 1,
current: "$data.current",
prior: "$data.prior"
}
}
])
As you can see on the playground : which returns this:
[
{
"_id": "[email protected]",
"current": {
"average": 42.5
},
"prior": {
"average": 68
}
},
{
"_id": "[email protected]",
"current": {
"average": 55
},
"prior": {
"average": 37
}
}
]