I am developing a spring boot REST API, which has to fetch large volume of data (1 or 2 GBs) from dynamoDB based on search condition and return the response to the API consumer. Response time is not critical, but the volume of data is critical. Which is the right tool/service to use? Is DynamoDB pagination suffice my requirement or do I need to use any kind of streaming?
I tried dynamoDB pagination, it takes longer time and is least scalable. I also tried using Java Completable future to split the request using various threads but this does not seems to be a correct solution.
CodePudding user response:
Reading GB's of data per request from DynamoDB does not seem scalable. Does the end user require all that data, what is the purpose?
DynamoDB can only return 1MB per request so for a single end user API call you would have to make many paginated requests to DynamoDB.
If you are using Scan
then your solution is not at all scalable and I would possibly suggest using a different database.
CodePudding user response:
This is not a good use case for REST in general. Have you considered storing the query result in an S3?
Your rest API will return a task id, that you can then use to check the progress of the query and eventually download the result.
This way you get infinite scalability and can run huge amounts of parallel dynamo scans or queries.