I have the following documents, where each document has fields values
and dates
, which are arrays. These arrays always have the same size in each document, meaning each value in dates
corresponds to a value in values
:
[
{
_id: "Stock1",
values: [
1,
2,
3
],
dates: [
ISODate("2000-01-01"),
ISODate("2010-01-01"),
ISODate("2020-01-01")
]
},
{
_id: "Stock2",
values: [
4,
5,
],
dates: [
ISODate("2000-01-01"),
ISODate("2010-01-01")
]
},
{
_id: "Stock3",
values: [
7,
8,
9
],
dates: [
ISODate("2000-01-01"),
ISODate("2010-01-01"),
ISODate("2020-01-01")
]
}
]
I would like to query my documents such that I obtain values
between dates
"2010-01-01" and dates
"2020-01-01" (included) for "Stock1" and "Stock3" only, i.e. I would like to end up with:
[
{
_id: "Stock1",
values: [
2,
3
],
dates: [
ISODate("2010-01-01"),
ISODate("2020-01-01")
]
},
{
_id: "Stock3",
values: [
8,
9
],
dates: [
ISODate("2010-01-01"),
ISODate("2020-01-01")
]
}
]
For the moment, I am doing the following:
db.collection.aggregate([
{
$match: {
_id: {
$in: [
"Stock1",
"Stock3"
]
}
}
},
{
$unwind: {
path: "$dates",
includeArrayIndex: "date_index"
}
},
{
$match: {
dates: {
$gte: ISODate("2010-01-01"),
$lte: ISODate("2020-01-01")
}
}
},
{
$unwind: {
path: "$values",
includeArrayIndex: "value_index"
}
},
{
$match: {
$expr: {
$eq: [
"$date_index",
"$value_index"
]
}
}
},
{
$project: {
date_index: 0,
value_index: 0
}
}
])
But I am not quite there yet. Also, the pipeline looks long and sub-optimal. Are there better ways to do this? Finally, I am first using unwind
on dates
, then filtering before using unwind
again on values
: This is to avoid a number of documents in the pipeline being too big as the array dates
and values
can be potentially huge.
Any help would appreciated!
CodePudding user response:
Query
- match the to keep only
"Stock1","Stock3"
- filter on indexes of dates
(range (size "$dates"))
, to get only the indexes of elements with date in the range[2010-2020]
- 2 map to get those filtered indexes from
dates
andvalues
*we could do all in like 1 reduce but it would be nested and more complicated code, also $concatArrays
is slow and its not good idea to reduce array to array.The bellow is simpler and can work even for very big arrays.
aggregate(
[{"$match": {"_id": { "$in": ["Stock1","Stock3"]}}}
{"$set":
{"indexes":
{"$filter":
{"input": {"$range": [0, {"$size": "$dates"}]},
"cond":
{"$and":
[{"$gte":
[{"$arrayElemAt": ["$dates", "$$this"]},
ISODate("2010-01-01T00:00:00Z")]},
{"$lte":
[{"$arrayElemAt": ["$dates", "$$this"]},
ISODate("2020-01-01T00:00:00Z")]}]}}}}},
{"$set":
{"dates":
{"$map":
{"input": "$indexes",
"in": {"$arrayElemAt": ["$dates", "$$this"]}}}}},
{"$set":
{"values":
{"$map":
{"input": "$indexes",
"in": {"$arrayElemAt": ["$values", "$$this"]}}}}},
{"$unset": ["indexes"]}])