Home > front end >  DynamoDB Sort key so that I could query the last item by date
DynamoDB Sort key so that I could query the last item by date

Time:01-29

The following DynamoDB table has a partition key equal to date so I could easily query day by day. I know items with the same partition key are stored in the same partition and are ordered by their sort key.

However, the sort key in this case is wrong because TransferId is not something like a date that I could sort.

What sort key should I use so I could query the last item for a specific date? 'Exchange:Time'?

 ---------- ------------------------------------------ --------- -------------------------------------------- ------------ ---------- ------------ --------- ----------- --------------------------- ---------------- -------------------------------------------------------------------- ---------------------------------- -------------- ------------------------------ 
|    PK    |                    SK                    | Account |                  Address                   | AddressTag | Exchange | Instrument | Network | Quantity  |           Time            | TransactionFee |                           TransactionId                            |            TransferId            | TransferType |          UpdatedAt           |
 ---------- ------------------------------------------ --------- -------------------------------------------- ------------ ---------- ------------ --------- ----------- --------------------------- ---------------- -------------------------------------------------------------------- ---------------------------------- -------------- ------------------------------ 
| 20221217 | Binance:1b56485f6a3446c3b883f4f485039260 | main    | 0xF76d3f20bF155681b0b983bFC3ea5fe43A2A6E3c | null       | Binance  | USDT       | ETH     | 97.500139 | 2022-12-17T14:59:12 00:00 |            3.2 | 0x46d28f7d0e1e5b1d074a65dcfbb9d90b3bcdc7e6fca6b1f1f7abb5ab219feb24 | 1b56485f6a3446c3b883f4f485039260 |            0 | 2023-01-27T18:00:00.5930186Z |
| 20221217 | Binance:4747f6ecc74f4dd8a4b565e0f15bcf79 | main    | 0xF76d3f20bF155681b0b983bFC3ea5fe43A2A6E3c | null       | Binance  | USDT       | ETH     | 3107.4889 | 2022-12-17T15:38:23 00:00 |            3.2 | 0xbb2b92030b988a0184ba02e2e754b7a7f0f963c496c4e3473509c6fe6b54a41d | 4747f6ecc74f4dd8a4b565e0f15bcf79 |            0 | 2023-01-27T18:00:00.2944534Z |
 ---------- ------------------------------------------ --------- -------------------------------------------- ------------ ---------- ------------ --------- ----------- --------------------------- ---------------- -------------------------------------------------------------------- ---------------------------------- -------------- ------------------------------ 

I suppose the code should be something like the following:

var date = DateTimeOffset.UtcNow.ToString("yyyyMMdd");

var request = new QueryRequest
{
    TableName = "transfers-dev",
    KeyConditionExpression = "#date = :date",
    ExpressionAttributeNames = new Dictionary<string, string> { { "#date", "PK" } },
    ExpressionAttributeValues = new Dictionary<string, AttributeValue> { { ":date", new AttributeValue { S = date } } },
    ScanIndexForward = false,
    Limit = 1
};
var response = await client.QueryAsync(request);

return response.Items.First();

CodePudding user response:

Date as a partition key is not very useful, it essentially caps your throughput to 1000 WCU and also doesn't give your query flexibility like give me the data for the last 18 hours.

You should choose something useful as partition key, and choose date as your sort key.

PK SK data gs_pk
Binance:1b56485f6a3446c3b883f4f485039260 2022-12-17T14:59:12 00:00 data 1
Binance:1b56485f6a3446c3b883f4f485039260 2022-12-18T14:59:12 00:00 data 1
Binance:1b56485f6a3446c3b883f4f485039260 2022-12-19T14:59:12 00:00 data 1

You can now say give me all the data for a given binance in a given time frame. If you require a lookup where you want all the binances for the last given amount of time you can create a GSI and use a random or static value as partition key. They more keys you use the more throughput you will be capable of consuming.

gsi_pk SK PK data
1 2022-12-17T14:59:12 00:00 Binance:1b56485f6a3446c3b883f4f485039260 data
1 2022-12-17T14:59:12 00:00 Binance:1b56485f6a3446c3b883f4f485039260 data
1 2022-01-26T14:59:12 00:00 Binance:1b56485f6a3446c3b883f4f485039260 data

Get most recent data:

SELECT * FROM MYTABLE.MYINDEX WHERE gsi_pk = 1 DESC

Get data for a given day:

SELECT * FROM MYTABLE.MYINDEX WHERE gsi_pk = 1 AND SK BEGINS_WITH '26.01.2022'

Note

Having a single value for your GSI partition key will also limit your scalability, if you intend on writing more than 1000 items per second (assuming items < 1KB) then you will have to shard the index key.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-gsi-sharding.html

  • Related