Home > Software engineering >  Extremely slow batch_write operations with boto3 and dynamodb-local
Extremely slow batch_write operations with boto3 and dynamodb-local

Time:09-02

I am trying to load some GBs of data stored locally in 6 txt files inside some tables in a dockerized local Dynamodb instance using Python3 and the boto3 library.

The problem is the process speed, the estimated time for loading a single file's data (~10M lines) is of 19 Hours. I used a profiler to find the bottleneck in the code and the majority of the computational time is taken by the boto3 function storing the items in the database.

    def add_batch(self, items, table):
        request = {
            table: []
        }
        if type(items) is not list:
            print(f'\nError while loading item\'s batch, expecting a <list> but got {type(items)}')
            return None
        for item in items:
            request[table].append(
                {
                    'PutRequest': {
                        'Item': item
                    }
                }
            )
        while request:
            response = self._client.batch_write_item(RequestItems=request) # by far the slowest call
            if response['UnprocessedItems']:
                request = response['UnprocessedItems']
                print('unprocessed items: ', request)
            else:
                request = None
        return 0

The batch size is of 25 items, the throughput for the table is 100 (tried a lot of values with little results).

Understandably, I was getting better results when the container was run with the InMemory option set to True. I had to change it because I can't possibly wait hours every time I restart the container waiting for it to load the data. At the moment I start the container with this simple command: docker run -p 8000:8000 amazon/dynamodb-local -jar DynamoDBLocal.jar I tried with some parallelization but the boto3 library doesn't seem to like it since it keeps raising exceptions.

Resources utilization while executing the loading function

CodePudding user response:

DynamoDB.local is not designed at all for performance. It is merely meant to be for offline functional development and testing before deploying to production in the actual DynamoDB service.

  • Related