I have a DynnamoDB/NoSQL/MongoDB question. I am from an RDBMS backround and struggling to get a design right for a NoSQL DB. If anyone can help!
I have the following objects (tables in my terms):
- Organisation
- Users
- Courses
- Units
I want the following access points, of which most are achievable:
- Get/Create/Update and Delete Organisation
- Get/Create/Update and Delete Users
- Get/Create/Update and Delete Courses
Which I can achieve.
Issue is that the Users and Courses objects has many way to retrieve data:
- username
For example: List Users on course.
- List users for Org.
- List courses for Org.
- List users in org
- list users in unit
All these user secondary indexes, which I semi-understand, but I have tertiary..ish indexes, but that probably my design.
Coming from a relational methodology, I am not sure about reporting, how would it work if I wanted to do a search for all users under the courses which have not completed their (call it status flag)?
From what I understand I need indexes for everting I want to search by?
AWS DynamoDB is my preference, but another NoSQL I'll happily consider. I realise that I need more education regarding NoSQL, so please if anyone can provide good documentation and examples which help the learning process, that will be awesome.
Regards
Richard
I have watched a few UDEMY videos and been Gooling for many weeks (oh and checked here "obviously")
CodePudding user response:
Things to keep in mind
Partitions
In DynamoDB everything is organized in partitions that give you hash-based access to elements. This is very powerful in terms of performance but each partition has limits, so similarly to the hash function in hash maps the partition keys should try to equally distribute the elements
Single Table Design
Don't split the data into multiple tables. This makes everything harder and actually limits the capabilities of the DB. Store everything in a single table.
Keys
Keys in dynamo have to be designed around your access patterns. This is the hardest part.
You have Partition Key (Hash Key) -> this key has to be exactly specified every time. You can't perform a query without knowing the PK. This is why placing things like timestamps into PK is really bad idea.
Sort (Range) keys -> these are used for querying as specified in the AWS docs.
Attribute names
DB migrations are really hard in NoSQL so you have to use generic names for the attributes. They should not have any meaning.
For example "UserID" is a bad name for partition key, "PK" is a good name for partition key, same goes for all keys.
Indexes
You have two types of indexes, local and global.
Local Indexes are created once when you create the table and can't be changed (easily) afterward. You can only have a few of them. They give you an extra sort key to work with. The main benefit is that they are strongly consistent
Global Indexes can be created at any time. They give you both new partition key and sort key to work with but are eventually consistent. Got with global indexes unless you have a good reason to use local.
Going back to your problem, if we focus on one of the table as an example - Users
The user can be inserter like this (for example)
PK SK GSI1PK GSI1SK
Username#john123 Email#[email protected] Email#[email protected] Username#john123 <User Data>
This way you can query users by email and username. Keep in mind that PK and SK have to be unique pairs. SK in this case is free and can be used for other access patterns (which you didn't provide)
Another way might be to copy the data
PK SK
Username#john123 Email#[email protected] <user data>
Email#[email protected] Username#john123 <user data>
this way you avoid having to deal with indexes (which might be expensive sometimes) but you have to manually keep the user data consistent.
Further reading -> https://www.alexdebrie.com/posts/dynamodb-single-table/