Problem description: The app provides a list of products which can be purchased at some price and I want customers to be able to filter through all items by asking-price
or estimated-value
Sample attributes: asking-price, posted-by, product-id, estimated-value, status.
(status being available or sold)
Sample Data:
$10.00, [email protected], 1234, $12.00
I thought of using approach 1:
- one GSI with
product-id
as the partition key andasking-price
as sort key - one GSI with
product-id
as the partition key andestimated-value
as sort key
but the problem with this is that we don't know the partition key since we are looking through all items.
I also though of approach 2:
- one GSI with
status
as the partition key andasking-price
as sort key - one GSI with
status
as the partition key andestimated-value
as sort key
but the problem is that the products are no longer unique
Lastly, I tried approach 3:
- one GSI with
status
as the partition key andasking-price#product-id
as sort key - one GSI with
status
as the partition key andestimated-value#product-id
as sort key
but the problem here is that the greater than / less than conditions don't return accurate results since the sort key is now a string
It seems that the only answer left is to use SCANS with filterexpressions however I'm concerned with the costs of having a SCAN feature in the critical path of the application. Is there anyway around this? If not, would you recommend limits, max items, or parallel scans to improve performance? Should I go to a Relational DB?
CodePudding user response:
Your GSI approach 2 will work, items in a GSI do not need to be unique.
The only thing you need to be mindful of is using a low cardinality key (status). This would limit the throughput on your table. Each status would essentially be capped at 1000 WCU and 3000 RCU as that is the limit of a single partition.
If you need more throughput than that, then I would suggest using random numbers 0-N for your partition key. If you need to write at 10k WCU per second, then use random numbers 0-9 for the GSI PK. When you need to read, just run 10 parallel queries for each one of the keys.