I am trying to scrape the local supermarket product prices to keep an eye out for good offers. The data is saved in mongodb, but is getting increasingly slow. There are now around 35k products indexed with a total of 1 million records because the same product has a price history. A single record I insert looks something like this:
{
_id: ObjectId("626557a944ea574f69b49324"),
title: 'Nutella 21042022 2015 food',
category: 'kitchen',
date: '21042022',
datetime: ISODate("2022-04-21T12:15:00.000Z"),
href: 'http://www.website.com/product/345345',
name: 'Nutella 800g',
time: '2015',
website: 'www.website.com'
}
I have used an upsert on "title" which is unique from the supermarket's website.
const query = { title: title };
const update = { $set: { title, href, name, date, time, datetime..., }};
const options = { upsert: true };
const res = await collection.updateOne(query, update, options);
log("Added: " title);
Something like
db.supermarket.find({name: "Nutella 800g"}).count()
returns 176 and it takes a few seconds. I can live with that, but if it scales 5x to 10x in size, then the database is unusable. So there is obviously a better way to structure the data. What's the obvious thing I'm doing wrong?
edit:
Only change to indexes was adding one for "title".
db.supermarket.createIndex({title:1})
result:
[
{ v: 2, key: { _id: 1 }, name: '_id_' },
{ v: 2, key: { title: 1 }, name: 'title_1' }
]
CodePudding user response:
Have you created a unique index for your title? Because the index you mentioned on the comment seems to be a simple index and not a unique one. That should speed up your updates a lot.
To speed up your query, you can also create an index on the name column, which should help with retrieving the documents quicker