I am looking to model the data for a DynamoDB table using the single table design.
I came across the following in the DynamoDB documentation under Recommendations for partition keys:
Use high-cardinality attributes. These are attributes that have distinct values for each item, like emailid, employee_no, customerid, sessionid, orderid, and so on.
If I understand single table designs correctly, one would often go against this documentation, since we use the combination of partition key and sort key to model 1:n relationships. For instance, if I have a system with multiple blogs which contain posts, I could model this in a table with the following partition keys and sort keys:
Entity | Partition Key | Sort Key |
---|---|---|
Blog | [blog Id] |
"blog" |
Post | post#[blog Id] |
[post Date] |
Thus, we could end up with a number of items that have the identical partition key.
My question is how critical it is to follow the recommendation of 'high cardinality partition keys' especially if we can ensure that our primary key has high cardinality?
CodePudding user response:
Having good PK value dispersion means you're more easily able to use all the DynamoDB capacity allotted to you. If you have just one PK, you make it harder to scale up, because by default everything with the same PK value goes to the same partition.
This matters when you're doing thousands of reads or writes per second to items having the same PK value.
A blog id seems like a reasonable PK value. You'll be fine until you have a blogger who publishes hundreds of times per second or you're querying specifically against that blog id many thousands of times per second.