Home > OS >  DynamoDB sorting through data
DynamoDB sorting through data

Time:12-30

Everywhere I look, the web is telling me to never use scan() in dynamoDB. It uses all your capacity units, 1mb response size, etc.

I’ve looked at querying, but that doesn’t achieve what I want either.

How am I supposed to parse through my table?

Here is my setup- I have a table “people” with rows of people.

I have attributes “email” (partition key), “fName”, “lName”, “displayName”, “passwordHash”, and “subscribed”.

subscribed is either true or false, and I need to sort through every person who is subscribed.

I can’t use a sort key because all emails are unique…

It is my understanding that DynamoDB data is sorted like follows:

primary key-

—sort key 1

——— Item 1

—sort key 2

——- Item 2

primary key 2

—Sort ket 1

..etc..

So setting subscribed as a sort ket would not work… I would still need to loop through every primary key.

Right now I am just getting every item with a filterExpression to check if someone is subscribed. If they are, they pass. But what happens when I have hundreds of users, whose data eclipses 1mb?

I wouldn’t get every user that is subscribed in this case, and sending repeating requests with the start key to get every Mb of data is too tedious for the processor, and would slow the server down significantly

Are there any recommendations for how I should go about getting every subscribed user?

Note: Subscribed can not be a primary key and the email a sort key, because I have instances where I need just the user, which is easy to access if the email is the primary key.

CodePudding user response:

Right now I am just getting every item with a filterExpression to check if someone is subscribed. If they are, they pass. But what happens when I have hundreds of users, whose data eclipses 1mb?

GetItem for single person lookups

You should ideally be using a GetItem here by providing the users email as a search parameter, and then checking if they are subscribed or not. Scanning to see if an individual is subscribed is not scalable in any way.

Pagination

When data exceeds 1MB you simply paginate: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html


Are there any recommendations for how I should go about getting every subscribed user?

Sparse Indexes

For this use case it's best to use a sparse index, in which you set subscribed="true" only if it's true, if it's false don't set it (you must use a string also, as boolean can't be used as a key).

Once you do so, you can create a GSI on the attribute subscribed, now only the items which are true are contained in your GSI making it sparse. So a Scan on that value now makes it as efficient as possible, albeit it will limit throughout capacity to 1000 WCU.

Making things scalable

An even better way to do so is to create an attribute called GSI_PK and assign it a random number. Then use subscribed as a sort key, again using a string and only when true. This will mean that your index will not become a bottleneck and limit your throughput to 1000 WCU due to a single value being Partition key.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-general-sparse-indexes.html

  • Related