Home > front end >  Concatenating strings in Python at large scale
Concatenating strings in Python at large scale

Time:06-21

Say that I have a massive list of dictionaries (2 million dictionaries). I need to essentially do a json.dumps() of each dictionary into a massive string (to put in the body of a request to AWS OpenSearch). So far I have this:

json_data = ''
action = {'index': {}}
for item in data:
    json_data  = f'{json.dumps(action)}\n'
    json_data  = f'{json.dumps(item)}\n'

where data is the large dictionary. This takes on average between 0.9 and 1 second. Is there a more efficient way to do this?

Other SO questions conclude that if this was a simple string addition that has to be done once, doing c = a b is the fastest way, however, I have to keep appending to what in this case would be c. I have to repeat this operation many times, so speeding this up would be immensely helpful. Is there a way to speed up this function, and if so, what would those optimizations look like?

CodePudding user response:

Repeated string concatenation is slow. A better approach would be to build up a list of strings, and then join them at the end. I don't have access to your data, so I can't test this, but you'd be going for something along the lines of:

json_data = []
action = {'index': {}}
for item in data:
    json_data.append(action)
    json_data.append(item)
result = '\n'.join([json.dumps(blob) for blob in json_data])

CodePudding user response:

Variation...

import json
json_data = []
action = json.dumps({'index': {}}) # dumps is only called on this once
for item in data:
    # json_data will be a list of strings
    json_data.append(action)
    json_data.append(json.dumps(item))
result = '\n'.join(json_data)
  • Related