best way to process large data in chunks-CodePudding

I have like for example more then 20000 records . and data is something like this

data = [{'id': 1} , {'id':2} , {'id':3} .... 20000]

now I want to upload this data in chunks of 1000 . so what is best way to do this in 1000 chunk which will give least overhead .

CodePudding user response：

The best way is to use generators. These are generic iterators that allow you traverse objects using custom behaviours.

In your case, an easy solution is to use range which returns a generator of any specific size, for example:

range(1, len(data), 1000)

Will generate the values 1, 1001, 2001, ...

If you use that in a loop, you can then pass the specific range to a handler method, for example:

batch = 1000
for i in range(1, len(data), batch):
  handle(data[i:i batch])

CodePudding user response：

You can use generators to process the data in batches

def generateChunks(data,    batchsize):
  for i in range(0, len(data),batchsize):
    yield data[i:i batchsize]

Process data

chunks = generateChunks(data, 1000)
for chunk in chunks:
  print(chunk)