Home > Software design >  How to use StartingToken with DynamoDB pagination scan
How to use StartingToken with DynamoDB pagination scan

Time:12-31

I have a DynamoDB table and I want to output items from it to a client using pagination. I thought I'd use DynamoDB.Paginator.Scan and supply StartingToken, however I dont see NextToken in the output of either page or iterator itself. So how do I get it?

My goal is a REST API where client requests next X items from a table, supplying StartingToken to iterate. Originally there's no token, but with each response server returns NextToken which client supplies as a StartingToken to get the next X items.

import boto3
import json
table="TableName"
client = boto3.client("dynamodb")
paginator = client.get_paginator("query")
token = None
size=1

for i in range(1,10):
    client.put_item(TableName=table, Item={"PK":{"S":str(i)},"SK":{"S":str(i)}})

it = paginator.paginate(
    TableName=table,
    ProjectionExpression="PK,SK",
    PaginationConfig={"MaxItems": 100, "PageSize": size, "StartingToken": token}
)

for page in it:
    print(json.dumps(page, indent=2))
    break

As a side note - how do I get one page from paginator without using break/for? I tried using next(it) but it does not work.

Here's it object:

{
'_input_token': ['ExclusiveStartKey'],
 '_limit_key': 'Limit',
 '_max_items': 100,
 '_method': <bound method ClientCreator._create_api_method.<locals>._api_call of <botocore.client.DynamoDB object at 0x000001CBA5806AA0>>,
 '_more_results': None,
 '_non_aggregate_key_exprs': [{'type': 'field', 'children': [], 'value': 'ConsumedCapacity'}],
 '_non_aggregate_part': {'ConsumedCapacity': None},
 '_op_kwargs': {'Limit': 1,
                'ProjectionExpression': 'PK,SK',
                'TableName': 'TableName'},
 '_output_token': [{'type': 'field', 'children': [], 'value': 'LastEvaluatedKey'}],
 '_page_size': 1,
 '_result_keys': [{'type': 'field', 'children': [], 'value': 'Items'},
                  {'type': 'field', 'children': [], 'value': 'Count'},
                  {'type': 'field', 'children': [], 'value': 'ScannedCount'}],
 '_resume_token': None,
 '_starting_token': None,
 '_token_decoder': <botocore.paginate.TokenDecoder object at 0x000001CBA5D81960>,
 '_token_encoder': <botocore.paginate.TokenEncoder object at 0x000001CBA5D82290>
}

And the page:

{
  "Items": [
    {
      "PK": {
        "S": "2"
      },
      "SK": {
        "S": "2"
      }
    }
  ],
  "Count": 1,
  "ScannedCount": 1,
  "LastEvaluatedKey": {
    "PK": {
      "S": "2"
    },
    "SK": {
      "S": "2"
    }
  },
  "ResponseMetadata": {
    "RequestId": "DBE4ON8SI0GOTS2RRO2OG43QJVVV4KQNSO5AEMVJF66Q9ASUAAJG",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "server": "Server",
      "date": "Fri, 30 Dec 2022 11:37:52 GMT",
      "content-type": "application/x-amz-json-1.0",
      "content-length": "121",
      "connection": "keep-alive",
      "x-amzn-requestid": "DBE4ON8SI0GOTS2RRO2OG43QJVVV4KQNSO5AEMVJF66Q9ASUAAJG",
      "x-amz-crc32": "973385738"
    },
    "RetryAttempts": 0
  }
}

I thought I could use LastEvaluatedKey but that throws an error, also tried to get token like this, but it did not work:

it._token_encoder.encode(page["LastEvaluatedKey"])

I also thought about using just scan without iterator, but I'm actually outputting a very filtered result-set. I need to set Limit to a very large value to get results and I don't want too many results at the same time. Is there a way to scan up to 1000 items but stop as soon as 10 items are found?

CodePudding user response:

I would suggest not using paginator but rather just use the lower level Query. The reason being is the confusion between NextToken and LastEvaluatedKey. These are not interchangeable.

  • LastEvaluatedKey is passed to ExclusiveStartKey
  • NextToken is passed to StartToken

It's preferrable to use the Resource Client which I believe causes no confusing on how to paginate

import boto3

dynamodb = boto3.resource('dynamodb', region_name=region)

table = dynamodb.Table('my-table')

response = table.query()
data = response['Items']

# LastEvaluatedKey indicates that there are more results
while 'LastEvaluatedKey' in response:
    response = table.query(ExclusiveStartKey=response['LastEvaluatedKey'])
    data.update(response['Items'])

CodePudding user response:

The LastEvaluatedKey is in the response object and can be set as the ExclusiveStartKey in the scan.

Sample code showing this can be found in the AWS DynamoDB Sample code (here, for example)

  • Related