Home > database >  How to avoid SCAN while filtering on a marketplace app using Dynamodb?
How to avoid SCAN while filtering on a marketplace app using Dynamodb?

Time:10-11

Problem description: The app provides a list of products which can be purchased at some price and I want customers to be able to filter through all items by asking-price or estimated-value

Sample attributes: asking-price, posted-by, product-id, estimated-value, status. (status being available or sold)

Sample Data: $10.00, [email protected], 1234, $12.00

I thought of using approach 1:

  • one GSI with product-id as the partition key and asking-price as sort key
  • one GSI with product-id as the partition key and estimated-value as sort key

but the problem with this is that we don't know the partition key since we are looking through all items.

I also though of approach 2:

  • one GSI with status as the partition key and asking-price as sort key
  • one GSI with status as the partition key and estimated-value as sort key

but the problem is that the products are no longer unique

Lastly, I tried approach 3:

  • one GSI with status as the partition key and asking-price#product-id as sort key
  • one GSI with status as the partition key and estimated-value#product-id as sort key

but the problem here is that the greater than / less than conditions don't return accurate results since the sort key is now a string

It seems that the only answer left is to use SCANS with filterexpressions however I'm concerned with the costs of having a SCAN feature in the critical path of the application. Is there anyway around this? If not, would you recommend limits, max items, or parallel scans to improve performance? Should I go to a Relational DB?

CodePudding user response:

Your GSI approach 2 will work, items in a GSI do not need to be unique.

The only thing you need to be mindful of is using a low cardinality key (status). This would limit the throughput on your table. Each status would essentially be capped at 1000 WCU and 3000 RCU as that is the limit of a single partition.

If you need more throughput than that, then I would suggest using random numbers 0-N for your partition key. If you need to write at 10k WCU per second, then use random numbers 0-9 for the GSI PK. When you need to read, just run 10 parallel queries for each one of the keys.

  • Related