I search best method for split long string look like b'a: 1\nb: 2\n ...'
- about 50-70 keys.
Length of string is 8-10K bytes. So, I have about 1K strings per second.
By best method looks like:
dict(x.split(b": ") for x in bytes(headers).split(b'\n'))
Maybe cython gives good result?
CodePudding user response:
may be your are looking for something like this which will only use itertools to save memory on long strings and
from itertools import pairwise
def string_to_dict(str_value):
#l is a list of indices of each '\n' inside the string
l = []
i = 0
while i<len(str_value):
if str_value[i] == b'\n':l.append(i)
i =1
#pairwise(l) will give us a list of 2-tuple indices to get each
# substring in the format 'key:value'
#str_value[x[0] 1:x[1]].split(b': ') will give us (key, value) tuple
#to dynamically create the global dict
result_dict = dict(str_value[x[0] 1:x[1]].split(b': ') for x in pairwise(l))
return result_dict
or more efficient again, the following will save memory at the price of the compute
def string_to_dict(str_value):
w = (i for i in range(len(str_value)) if str_value[i]==b'\n')
result_dict = dict(str_value[x[0] 1:x[1]].split(b': ') for x in pairwise(w))
return result_dict
CodePudding user response:
As long as the input is well-formed, we could replace the :
delimiter with the same delimiter (\n
), and split both at once, then slice for keys/values.
The code looks something like:
def fast_split(data):
items = bytes(data).replace(b": ", b"\n").split(b"\n")
return dict(zip(items[::2], items[1::2]))
On my machine, its about 3x faster-
from timeit import timeit
size = 100000
test_str = b"\n".join([b"a: 1"] * size)
def slow_split(data):
return dict(x.split(b": ") for x in bytes(data).split(b'\n'))
def fast_split(data):
items = bytes(data).replace(b": ", b"\n").split(b"\n")
return dict(zip(items[::2], items[1::2]))
print(fast_split(test_str) == slow_split(test_str))
print(timeit("slow_split(test_str)", number=100, setup="from __main__ import slow_split, test_str"))
print(timeit("fast_split(test_str)", number=100, setup="from __main__ import fast_split, test_str"))
True
1.373571052972693
0.4970768200000748