Home > Net >  Dynamo DB queries on Secondary Index
Dynamo DB queries on Secondary Index

Time:03-21

I have a use case where I am fetching data on certain items (unique itemID) multiple times a day (identified by day_BatchTime) and storing them in DyanmoDB. My composite primary key consists of itemID & day_BatchTime. I have set itemID as partition key and day_BatchTime as the sort key.

But I need to report for each day on a daily basis. So I tried setting up a global secondary index as feedDate. But the query on this is working a bit slow in AWS console. Also, I am getting an error when executing the below query in lambda using Python. Below are the relevant snippets:

response = table.query(KeyConditionExpression=Key('feedDate').eq('18-03-2022'))

"errorMessage": "An error occurred (ValidationException) when calling the Query operation: Query condition missed key schema element: itemID"

The table has about 53,000 items with global secondary index populated for about 31,000 items and I am querying for about 6000 items that are updated in a day. The query execution time appears to much higher compared to what one would normally expect.

Below are my global secondary index details.

  • Name: feedDate-index
  • Status: Active
  • Partition key: feedDate (String)
  • Sort key: -
  • Read capacity Range: 1 - 10
    • Auto scaling at 70%
    • Current provisioned units: 1
  • Write capacity Range: 1 - 10
    • Auto scaling at 70%
    • Current provisioned units: 1
  • Size 8.9 megabytes, Item count 31,737

Please let me know if I am missing something.

CodePudding user response:

As pointed out by @hoangdv in the comments, you forgot to add the index name to the query. By default, Query reads from the base table, so you need to explicitly point it to the global secondary index.

Something like this should do the trick:

response = table.query(
    IndexName="feedDate-index",
    KeyConditionExpression=Key('feedDate').eq('18-03-2022')
)

Concerning your perceived performance issues, those are difficult to address without concrete numbers and data. On a general note, the Query API returns up to 1000 items or 1MB of data per API call, then a follow-up API call with the pagination token (ExclusiveStartKey) needs to be performed. You're looking at at least six subsequent API calls for your 6000 items.

The source of the query and complexity of the data may also impact performance. For example, a tiny Lambda function with 128 MB RAM will take a lot longer to deserialize items than one with more performance. I wrote a blog about this topic a while ago if you're curious (disclaimer: written by me, relevant to the topic).

  • Related