In MongoDB, if my queries do not involve any joins, can I assume that it will scale?-CodePudding

I have an APP that will be demanding in terms of pulling data. Each time a user logs in, data is pulled, each time a new page is visited data is pulled, etc.

Let's suppose that these queries will never involve joins.

Can I assume then that the queries will scale?

CodePudding user response：

No, it does not follow that using MongoDB and not using joins means "your queries will scale." That's a myth told by MongoDB marketing, not real software engineering.

It depends what your query is doing. Every query has a cost, no matter what brand of datastore you use. Every data access needs to use resources on the server, and that resource usage adds up. Do you queries scan thousands or millions of documents in the MongoDB datastore? Do they need to do map-reduce? How many documents are in the query response? Is it pulling data that is cached, or will it cost I/O overhead to pull that data? How many requests per second do you need to serve? Can MongoDB support the rate of queries you need to do? Are you configuring a MongoDB replica set or a sharded cluster? How many shards do you queries need to visit to get their result? How powerful are the servers hosting each node?

These are some examples of the types of questions you need to understand and analyze for your queries and your MongoDB cluster (the list is not complete).

You don't need to give me the answers to these questions. I'm just using them to illustrate why it's a naive question to ask "will it scale?"

It's like asking "I'm need to drive my car to my brother's house, will I have to refill my fuel tank?" That's not enough information to answer the question. How far away is your brother's house? What type of vehicle do you have? What is its fuel efficiency? Is your vehicle laden with a lot of heavy cargo? How many times do you need to make the trip? How fast are you driving? How rough are the roads on the route?

CodePudding user response：

There are probably many things to consider depending on your needs but i think the main difference comes from the data model (that MongoDB is made to support and scale on)

Document => more related data in 1 place

fewer joins (expensive especially if data are in different machines)
fewer transactions (single document updates are atomic)
simpler smaller schema, more tailored to your application
data model, similar to the way programmers save their data on objects(maps)/arrays

If you have many applications or too many different ways to access the same data, maybe you end up normalizing more your data to a more general data representation => losing some of the above benefits or duplicating some of your data to serve the different needs.