NoSQL Database Design with multile indexes-CodePudding

I have a DynnamoDB/NoSQL/MongoDB question. I am from an RDBMS backround and struggling to get a design right for a NoSQL DB. If anyone can help!

I have the following objects (tables in my terms):

Organisation
Users
Courses
Units

I want the following access points, of which most are achievable:

Get/Create/Update and Delete Organisation
Get/Create/Update and Delete Users
Get/Create/Update and Delete Courses

Which I can achieve.

Issue is that the Users and Courses objects has many way to retrieve data:

email
username

For example: List Users on course.

List users for Org.
List courses for Org.
List users in org
list users in unit

All these user secondary indexes, which I semi-understand, but I have tertiary..ish indexes, but that probably my design.

Coming from a relational methodology, I am not sure about reporting, how would it work if I wanted to do a search for all users under the courses which have not completed their (call it status flag)?

From what I understand I need indexes for everting I want to search by?

AWS DynamoDB is my preference, but another NoSQL I'll happily consider. I realise that I need more education regarding NoSQL, so please if anyone can provide good documentation and examples which help the learning process, that will be awesome.

Regards

Richard

I have watched a few UDEMY videos and been Gooling for many weeks (oh and checked here "obviously")

CodePudding user response：

Things to keep in mind

Partitions

In DynamoDB everything is organized in partitions that give you hash-based access to elements. This is very powerful in terms of performance but each partition has limits, so similarly to the hash function in hash maps the partition keys should try to equally distribute the elements

Single Table Design

Don't split the data into multiple tables. This makes everything harder and actually limits the capabilities of the DB. Store everything in a single table.

Keys

Keys in dynamo have to be designed around your access patterns. This is the hardest part.

You have Partition Key (Hash Key) -> this key has to be exactly specified every time. You can't perform a query without knowing the PK. This is why placing things like timestamps into PK is really bad idea.

Sort (Range) keys -> these are used for querying as specified in the AWS docs.

Attribute names

DB migrations are really hard in NoSQL so you have to use generic names for the attributes. They should not have any meaning.

For example "UserID" is a bad name for partition key, "PK" is a good name for partition key, same goes for all keys.

Indexes

You have two types of indexes, local and global.

Local Indexes are created once when you create the table and can't be changed (easily) afterward. You can only have a few of them. They give you an extra sort key to work with. The main benefit is that they are strongly consistent

Global Indexes can be created at any time. They give you both new partition key and sort key to work with but are eventually consistent. Got with global indexes unless you have a good reason to use local.

Going back to your problem, if we focus on one of the table as an example - Users

The user can be inserter like this (for example)

PK               SK                   GSI1PK               GSI1SK
Username#john123 Email#[email protected] Email#[email protected] Username#john123  <User Data>

This way you can query users by email and username. Keep in mind that PK and SK have to be unique pairs. SK in this case is free and can be used for other access patterns (which you didn't provide)

Another way might be to copy the data

PK                   SK
Username#john123     Email#[email protected]  <user data>
Email#[email protected] Username#john123      <user data>

this way you avoid having to deal with indexes (which might be expensive sometimes) but you have to manually keep the user data consistent.

Further reading -> https://www.alexdebrie.com/posts/dynamodb-single-table/

-> my medium post