Home > Enterprise >  Mongodb subdocument structure best practices and queries
Mongodb subdocument structure best practices and queries

Time:02-24

I've seen 2 main types of schema for subdocuments:

{
    "cbill@boogiemail:com": {
        "outbound": [
            {
                "name": "First",
                "state": {
                    "saved": "[email protected]",
                    "edited": "[email protected]",
                    "status": "active"
                },
                "data": {
                }
            },
            {
                "name": "Second",
                "state": {
                    "saved": "[email protected]",
                    "edited": "[email protected]",
                    "status": "draft"
                },
                "data": {
                }
            }
        ],
        "inbound" : [
            {
                "name": "First",
                "state": {
                    "saved": "[email protected]",
                    "edited": "[email protected]",
                    "status": "active"
                },
                "data": {
                }
            },
            {
                "name": "Second",
                "state": {
                    "saved": "[email protected]",
                    "edited": "[email protected]",
                    "status": "draft"
                },
                "data": {
                }
            }
        ]
    }
}

The alternative structure is:

{
    "cbill@boogiemail:com": {
        "outbound": {
            "First": {
                "state": {
                    "saved": "[email protected]",
                    "edited": "[email protected]",
                    "status": "active"
                },
                "data": {
                }
            },
            "Second": {
                "state": {
                    "saved": "[email protected]",
                    "edited": "[email protected]",
                    "status": "draft"
                },
                "data": {
                }
            }
        },
        "inbound" : {
            "First": {
                "state": {
                    "saved": "[email protected]",
                    "edited": "[email protected]",
                    "status": "active"
                },
                "data": {
                }
            },
            "Second": {
                "state": {
                    "saved": "[email protected]",
                    "edited": "[email protected]",
                    "status": "draft"
                },
                "data": {
                }
            }
        }
    }
}

The main difference between the two is the structure of the inbound/outbound subdocuments.

What is the best practice for Mongo DB subdocument structures? And in each case, what query would get me the subdocument pointed to by:

cbill@boogiemail:com.inbound.Second ?

To add a bit more information:

The collection will have many different documents starting with different email addresses, but each document in the collection will only have a few subdocuments under the inbound/outbound keys.

CodePudding user response:

You want to structure your collections and documents in a way that reflects how you intend to use the data. If you're going to do a lot of complex queries, especially with subdocuments, you might find it easier to split your documents up into separate collections. An example of this would be splitting comments from blog posts.

Your comments could be stored as an array of subdocuments:

# Example post document with comment subdocuments
{
    title: 'How to Mongo!'
    content: 'So I want to talk about MongoDB.',
    comments: [
        {
            author: 'Renold',
            content: 'This post, it's amazing.'
        },
        ...
    ]
}

This might cause problems, though, if you want to do complex queries on just comments (e.g. picking the most recent comments from all posts or getting all comments by one author.) If you plan on making these complex queries, you'd be better off creating two collections: one for comments and the other for posts.

# Example post document with "ForeignKeys" to comment documents
{
    _id: ObjectId("50c21579c5f2c80000000000"),
    title: 'How to Mongo!',
    content: 'So I want to talk about MongoDB.',
    comments: [
        ObjectId("50c21579c5f2c80000000001"),
        ObjectId("50c21579c5f2c80000000002"),
        ...
    ]
}

# Example comment document with a "ForeignKey" to a post document
{
    _id: ObjectId("50c21579c5f2c80000000001"),
    post_id: ObjectId("50c21579c5f2c80000000000"),
    title: 'Renold',
    content: 'This post, it's amazing.'
}

This is similar to how you'd store "ForeignKeys" in a relational database. Normalizing your documents like this makes for querying both comments and posts easy. Also, since you're breaking up your documents, each document will take up less memory. The trade-off, though, is you have to maintain the ObjectId references whenever there's a change to either document (e.g. when you insert/update/delete a comment or post.) And since there are no event hooks in Mongo, you have to do all this maintenance in your application.

On the other-hand, if you don't plan on doing any complex queries on a document's subdocuments, you might benefit from storing monolithic objects. For instance, a user's preferences isn't something you're likely to make queries for:

# Example user document with address subdocument
{
    ObjectId("50c21579c5f2c800000000421"),
    name: 'Howard',
    password: 'naughtysecret',
    address: {
        state: 'FL',
        city: 'Gainesville',
        zip: 32608
    }
}

CodePudding user response:

Found the answer from here (https://www.tutorialspoint.com/how-to-select-a-specific-subdocument-in-mongodb) after some slight modifications to that.

The query for the second example (which was the one that I was most interested in) was:

find({ "cbill@boogiemail:com.inbound": {$exists: true}},{"cbill@boogiemail:com.inbound.Second":1}).pretty()

This results in:

{
    "_id" : ObjectId("6216a9940b84b1a642cb925e"),
    "cbill@boogiemail:com" : {
        "inbound" : {
            "Second" : {
                "state" : {
                    "saved" : "[email protected]",
                    "edited" : "[email protected]",
                    "status" : "draft"
                },
                "data" : {
                    
                }
            }
        }
    }
}

Whether this is the most efficient query I'm not sure - feel free to post any better alternatives.

  • Related