Home > Software engineering >  Using setWindowFields to compute change in values
Using setWindowFields to compute change in values

Time:04-08

I have a large set of records in this schema:

{
   username: "[email protected]",
   timestamp: 1646006400000 //unix timestamp in milliseconds
   amount: 100
}

Given an arbitrary range of dates, I need to find the change in average amount value vs a similar prior range of dates.

For example, if the input is March, 10 to March, 15, I need to calculate the average in that period and the difference with the average in the March, 5 to March 10 period.

This is how I planned it:

  1. given the from_date & to_date, calculate the interval and subtract to get prior_from_date & prior_to_date
  2. match all the records from prior_from_date to to_date
  3. use setWindowFields to create a flag prior on the prior records
  4. group by username and then group by prior flag to calculate averages

I can compute step 1 on the app side (PHP). I'm able to get the average using $avg but am unable to figure out how to do step 3. This is what I have so far:

{
  partitionBy: '$username',
  sortBy: {
    timestamp: 1
  },
  output: {
    prior: {
      $sum: '$prior',
      window: {
        range: [
          -86400000, // milliseconds in a day - may be multiply by interval from PHP?
          0
        ]
      }
    }
  }
}

This is the final desired output:

{
 {
   username:"[email protected]",
   prior: {
      average: 50
   },
   current: {
      average: 74
   }
 },
 {
   username:"[email protected]",
   prior: {
       average: 73
   },
   current: {
       average: 33
   }
 }
}

CodePudding user response:

Your requirements are not so clear to me, but could be this one:

db.collection.aggregate([
   { $set: { timestamp: { $toDate: "$timestamp" } } },
   {
      $setWindowFields: {
         partitionBy: "$username",
         sortBy: { timestamp: 1 },
         output: {
            average: {
               $avg: "$amount",
               window: { range: [-5, 5], unit: "day" }
            }
         }
      }
   }
])

CodePudding user response:

Since you are using a $match step and you already have the timestamps calculated, you can just group using $addFields with a condition to get the same effect:

db.collection.aggregate([
  {
    $match: {
      timestamp: {
        $gte: 1646438400000,
        $lt: 1647302400000
      }
    }
  },
  {
    "$addFields": {
      "period": {
        "$cond": [
          {$gt: ["$timestamp", 1646870400000]},
          "current",
          "prior"
        ]
      }
    }
  },
  {
    $group: {
      _id: {period: "$period", username: "$username"},
      average: {$avg: "$amount"}
    }
  },
  {
    $project: {
      username: "$_id.username",
      "data.k": "$_id.period",
      "data.v": {average: "$average"},
      _id: 0
    }
  },
  {
    $group: {_id: "$username",
      data: {$push: "$data"}
    }
  },
  {
    $project: {
      "data": {"$arrayToObject": "$data"}
    }
  },
  {
    $project: {
      _id: 1,
      current: "$data.current",
      prior: "$data.prior"
    }
  }
])

As you can see on the playground : which returns this:

    [
  {
    "_id": "[email protected]",
    "current": {
      "average": 42.5
    },
    "prior": {
      "average": 68
    }
  },
  {
    "_id": "[email protected]",
    "current": {
      "average": 55
    },
    "prior": {
      "average": 37
    }
  }
]
  • Related