How to query on two arrays at the same time in MongoDB?-CodePudding

I have the following documents, where each document has fields values and dates, which are arrays. These arrays always have the same size in each document, meaning each value in dates corresponds to a value in values:

[
  {
    _id: "Stock1",
    values: [
      1,
      2,
      3
    ],
    dates: [
      ISODate("2000-01-01"),
      ISODate("2010-01-01"),
      ISODate("2020-01-01")
    ]
  },
  {
    _id: "Stock2",
    values: [
      4,
      5,
    ],
    dates: [
      ISODate("2000-01-01"),
      ISODate("2010-01-01")
    ]
  },
  {
    _id: "Stock3",
    values: [
      7,
      8,
      9
    ],
    dates: [
      ISODate("2000-01-01"),
      ISODate("2010-01-01"),
      ISODate("2020-01-01")
    ]
  }
]

I would like to query my documents such that I obtain values between dates "2010-01-01" and dates "2020-01-01" (included) for "Stock1" and "Stock3" only, i.e. I would like to end up with:

[
  {
    _id: "Stock1",
    values: [
      2,
      3
    ],
    dates: [
      ISODate("2010-01-01"),
      ISODate("2020-01-01")
    ]
  },
  {
    _id: "Stock3",
    values: [
      8,
      9
    ],
    dates: [
      ISODate("2010-01-01"),
      ISODate("2020-01-01")
    ]
  }
]

For the moment, I am doing the following:

db.collection.aggregate([
  {
    $match: {
      _id: {
        $in: [
          "Stock1",
          "Stock3"
        ]
      }
    }
  },
  {
    $unwind: {
      path: "$dates",
      includeArrayIndex: "date_index"
    }
  },
  {
    $match: {
      dates: {
        $gte: ISODate("2010-01-01"),
        $lte: ISODate("2020-01-01")
      }
    }
  },
  {
    $unwind: {
      path: "$values",
      includeArrayIndex: "value_index"
    }
  },
  {
    $match: {
      $expr: {
        $eq: [
          "$date_index",
          "$value_index"
        ]
      }
    }
  },
  {
    $project: {
      date_index: 0,
      value_index: 0
    }
  }
])

But I am not quite there yet. Also, the pipeline looks long and sub-optimal. Are there better ways to do this? Finally, I am first using unwind on dates, then filtering before using unwind again on values: This is to avoid a number of documents in the pipeline being too big as the array dates and values can be potentially huge.

Any help would appreciated!

CodePudding user response：

Query

match the to keep only "Stock1","Stock3"
filter on indexes of dates (range (size "$dates")), to get only the indexes of elements with date in the range [2010-2020]
2 map to get those filtered indexes from dates and values

*we could do all in like 1 reduce but it would be nested and more complicated code, also $concatArrays is slow and its not good idea to reduce array to array.The bellow is simpler and can work even for very big arrays.

Test code here

aggregate(
[{"$match": {"_id": { "$in": ["Stock1","Stock3"]}}}
 {"$set": 
    {"indexes": 
      {"$filter": 
        {"input": {"$range": [0, {"$size": "$dates"}]},
          "cond": 
          {"$and": 
            [{"$gte": 
                [{"$arrayElemAt": ["$dates", "$$this"]},
                  ISODate("2010-01-01T00:00:00Z")]},
              {"$lte": 
                [{"$arrayElemAt": ["$dates", "$$this"]},
                  ISODate("2020-01-01T00:00:00Z")]}]}}}}},
  {"$set": 
    {"dates": 
      {"$map": 
        {"input": "$indexes",
          "in": {"$arrayElemAt": ["$dates", "$$this"]}}}}},
  {"$set": 
    {"values": 
      {"$map": 
        {"input": "$indexes",
          "in": {"$arrayElemAt": ["$values", "$$this"]}}}}},
  {"$unset": ["indexes"]}])