I have approx. 1.1 million records in Mongo and written the below query to get the filtered data. When I am trying to run the query in mongo compass it is giving exceeded time out error. I am new to Mongo and doesn't have much idea on how can I optimize it.
[
{
$match: {
offerCheckedDate: {
$gte: ISODate("2022-11-14T00:00:00.000Z"),
$lt: ISODate("2022-12-20T00:00:00.000Z"),
},
offerAvailable: "YES",
channelId: {
$in: [
1000001,
1000000
]
}
},
},
{
$group: {
_id: {
mobile: "$mobile",
},
mobile: {
$addToSet: "$mobile",
},
},
},
{
$unwind: "$mobile",
},
{
$lookup: {
from: "PA_DATA_REPORTING",
localField: "mobile",
foreignField: "mobile",
as: "result",
},
},
{
$unwind: "$result",
},
{
$replaceRoot: {
newRoot: "$result",
},
},
{
$match: {
customAppliedDate: {
$gte: ISODate("2022-11-14T00:00:00.000Z"),
$lt: ISODate("2022-11-20T00:00:00.000Z"),
},
},
},
{
$project: {
equal: {
$eq: ["$financierId", "$appliedFinancierId"],
},
doc: "$$ROOT",
},
},
{
$match: {
equal: true,
},
},
{
$group: {
_id: {
financierId: "$doc.financierId",
},
mobile: {
$addToSet: "$doc.mobile",
},
},
},
{
$unwind: "$mobile",
},
{
$group: {
_id: "$_id.financierId",
mobileCount: {
$sum: 1,
},
},
},
]
I tried adding the pipeline in $lookUp but even that didn't help. Something like below:
{
from: "PA_DATA_REPORTING",
localField: "mobile",
foreignField: "mobile",
pipeline: [
{
$match: {
customAppliedDate: {
$gte: ISODate("2022-11-14T00:00:00.000Z"),
$lt: ISODate("2022-11-18T00:00:00.000Z"),
},
},
},
],
as: "result",
}
Below is the sample document I am iterating through.
{
"_id": {
"$binary": {
"base64": "fURSsmgrcSh/xWN/ENWwiA==",
"subType": "03"
}
},
"enquiryId": "e4f22813-66f9-4a09-9e92-66bacd791943",
"mobile": "7945536728",
"financierId": {
"$numberLong": "280005"
},
"channelId": {
"$numberLong": "1000000"
},
"offerAvailable": "NO",
"offerCheckedDate": {
"$date": {
"$numberLong": "1640975400000"
}
},
"financierName": "Cholamandalam Finance",
"bankOfferAmount": {
"$numberLong": "10000"
},
"appliedFinancierId": {
"$numberLong": "280004"
},
"appliedFinancierName": "Cholamandalam Finance",
"paAppliedDate": {
"$date": {
"$numberLong": "1640975400000"
}
},
"paDisbursedDate": {
"$date": {
"$numberLong": "1640975400000"
}
},
"paSanctionedDate": {
"$date": {
"$numberLong": "1640975400000"
}
},
"customAppliedDate": {
"$date": {
"$numberLong": "1640975400000"
}
},
"customSanctionedDate": {
"$date": {
"$numberLong": "1640975400000"
}
},
"landedOnPAOfferPage": "NO",
"_class": "com.maruti.fmp.reporting.domain.document.PAOfferDocument"
}
Is there any way I can optimize the query and resolve time out error.
CodePudding user response:
First of all, due to there are no information about data in database (about inner objects (there are a lot of unwind
operations), what type indexes exist(it lead to large query execution time), the problem could be in different places. I can just give a few advices how to optimize queries.
There is a flow how to solve performance problems:
- There is a mongo db tool to analyze queries. Try to use explain to check the critical part of your query.
- Collection scans means all documents in a collection must be read.
- Index scans limit the number of documents that must be inspected. Take a look here how to read explain with example.
- Add indexes to your database for critical parts (do not use index for all fields - it is necessary index only data that you need to look up) More about query optimization
Looks like one of the matching fields consume a lot of time in query processing. (check
offerAvailable
andchannelId
)
CodePudding user response:
I have no idea what you try to achieve, it is a bit difficult with only one sample document. Anyway, this one returns the same result as your query:
db.collection.aggregate([
{
$match: {
offerCheckedDate: {
$gte: ISODate("2022-11-14T00:00:00.000Z"),
$lt: ISODate("2022-12-20T00:00:00.000Z"),
},
offerAvailable: "YES",
channelId: { $in: [1000001, 1000000] },
}
},
{ $match: { $expr: { $eq: ["$financierId", "$appliedFinancierId"] } } },
{
$group: {
_id: "$financierId",
mobiles: { $addToSet: "$mobile" },
}
},
{
$project: {
mobileCount: { $size: "$mobiles" }
}
}
])
Most likely it is not exactly what you are looking for, but in general it seems you do a lot of redundant/useless stuff in your aggregation pipeline.
Maybe $setWindowFields is also a useful function for your use-case.