Home > front end >  Calculate average of array elements using MongoDB aggregation
Calculate average of array elements using MongoDB aggregation

Time:11-28

I have a collection of documents such as:

{
    _id: 5,
    responses: [
        {
            staff: false,
            timestamp: 1000
        },
        {
            staff: true,
            timestamp: 1500
        }
    ]
}

I have a function (using $function) used to apply some custom logic with the responses array:

const diffs=[];
let current;
for (let i = 0; i < responses.length; i  ) {
  if (i 1>=responses.length) break;
  if (!current && !responses[i].staff) current = responses[i];
  if (!current) continue;
  const next = responses[i 1];
  if (!next.staff) continue;
  diffs.push(next.timestamp - current.timestamp);
  current = undefined;
}
return diffs;

which basically returns array of numbers like [500, 1000, 10] etc. It can also potentially return an empty array ([]).

I want to basically combine all arrays into one (say one document returns [5, 10] and the next one returns [1, 2], the result would be [5, 10, 1, 2] -- order doesn't matter) and then calculate the average using $avg.

I was reading MongoDB docs and found $concatArrays, so to my understanding the process should be:

  1. Calculate diffs for each document, which will end up with an array like [[1, 2], [3, 4], [5, 6, 7], [], ...]
  2. Use $concatArrays on the value from step 1
  3. Use $avg on the array from step 2

How should I go about making step 1? The only part I'm not sure about is how to hold a variable in the first grouping stage with the result returned from $function. I understand I need to do something like this:

aggregate([
    {$group: {diffs: {$function: {...}}}}
])

However, I get the error MongoServerError: unknown group operator '$function'.

CodePudding user response:

I ended up getting it working the way I wanted. Sorry if there was some confusion over the data to anyone reading the original question -- the basic gist of the problem was, for each document the function calculates an array which are all numbers (like [1, 2, 3]), and I wanted to make one big array with all the numbers combined from all the documents, then calculate an average with it.

I got it working by first calculating the array for each document (diffs), unwinding the array, using $push to push each element of the array into a new one, and then finally using $avg on the created array. Hopefully this helps someone with a similar problem down the line.

db.tickets.aggregate([
    {
        $project: {
            responseTimes: {
                $function: {
                    body: function(responses) {
                        const diffs=[];
                        let current;
                        for (let i = 0; i < responses.length; i  ) {
                            if (i 1>=responses.length) break;
                            if (!current && !responses[i].staff) current = responses[i];
                            if (!current) continue;
                            const next = responses[i 1];
                            if (!next.staff) continue;
                            diffs.push(next.timestamp - current.timestamp);
                            current = undefined;
                        }
                        return diffs;
                    },
                    args: ["$responses"],
                    lang: "js"
                }
            }
        }
    },

    {$unwind: "$responseTimes"},

    {
        $group: {
            _id: null,

            responseTimes: {
                $push: "$responseTimes"
            }
        }
    },

    {
        $project: {
            avgResponseTime: {
                $avg: "$responseTimes"
            }
        }
    }
])

CodePudding user response:

Documentation says:

Executing JavaScript inside an aggregation expression may decrease performance. Only use the $function operator if the provided pipeline operators cannot fulfill your application's needs.

A starting point to achieve the same natively with an aggregation pipeline could be this one. I did not spend much time in testing, but this would be the direction.

db.collection.aggregate([
  {
    $set: {
      responseTimes: {
        $reduce: {
          input: "$responses",
          initialValue: [],
          in: {
            $cond: {
              if: { $not: "$$this.staff" },
              then: { $concatArrays: [ "$$value", [ "$$this.timestamp" ] ] },
              else: {
                $cond: {
                  if: { $gt: [ { $size: "$$value" }, 0 ] },
                  then: {
                    $concatArrays: [
                      { $slice: [ "$$value", { $subtract: [ { $size: "$$value" }, 1 ]  } ] },
                      [ { $subtract: [ "$$this.timestamp", { $last: "$$value" } ] } ]
                    ]
                  },
                  else: null
                }
              }
            }
          }
        }
      }
    }
  },
  {
    $project: {
      avgResponseTime: { $avg: "$responseTimes" }
    }
  }
])

https://mongoplayground.net/p/Ie6IL8atLX-

A different approach is to use input: { $range: [ 0, {$subtract: [{ $size: "$responses" }, 1]} ] }, and then access the elements with { $arrayElemAt: [ "$responses", "$$this" ] }, maybe see this example which is a similar use-case.

  • Related