Home > Back-end >  How to quickly convert lots of data to string?
How to quickly convert lots of data to string?

Time:10-27

Say I've got the following class:

import numpy as np

class ToStringify():
    DEMARCATION = "::::"
    def __init__(self):
        self.first_np_array = np.random.rand(30, 30, 30)
        self.second_np_array = np.random.rand(30, 30, 30)
        self.some_string = "string"
        self.some_int = 5

    def to_str_format(self) -> str:
        entries = [
            self.first_np_array, self.second_np_array, self.some_string, self.some_int
        ]
        return f"{self.DEMARCATION}".join([str(entry) for entry in entries])

I've profiled my code and the to_str_format takes about 25% of my total program running time. Unfortunately, to_str_format needs to output a string (whatever demarcation I choose) that will be consumed further down in some pipeline that I cannot change. I'm doing a list comprehension to try and speed things up, but other than that I'm not sure what else I can do (if at all). I'm using python 3.9 if that changes anything

CodePudding user response:

You'll have to profile again to verify if any of these suggestions make a big enough difference to matter, but quick informal testing shows maybe a 10-15% improvement, so it may be helpful.

First, this:

join([str(entry) for entry in entries])

The [ ] aren't necessary. join() can consume a generator expression, so a list comprehension just adds overhead of creating a (potentially large) object. So:

join(str(entry) for entry in entries)

may be a little faster (seems about 5%). Even faster than that (about 10%) would be:

join(map(str, entries))

Second is the f-string. If you aren't modifying or changing the representation of DEMARCATION at all, this is sufficient:

self.DEMARCATION.join(....)

which avoids the overhead of processing the f-string. All together, this may be the most efficient form:

return self.DEMARCATION.join(map(str, entries))

One additional thing is the entries list. If to_str_format() is called many times, and if entries never changes (that is, it's always a list of those 4 objects), then it's better to set it once elsewhere (e.g. an instance or class attribute). It may also be slightly faster to create it as a tuple rather than a list.

  • Related