I am trying to find the fastest possible way to convert a binary string to an array of 0
and 1
. I am currently using python 3.8, and have the following two functions to obtain such array:
import numpy as np
from typing import Literal, Sequence
def string_to_array(Bin_String):
Bin_array=[int(Bin_String[i],2) for i in range(len(Bin_String))]
return Bin_array
def string_to_array_LtSq(string: Sequence[Literal['0', '1']]) -> np.ndarray:
return np.array([int(c) for c in string])
For a string of length 1024, string_to_array_LtSq
function takes 20 micro-seconds less than the other (average 370 micro-seconds) though I don't understand why it is faster since both are using int
function.
But this is an important part of the code, so is there a faster way in python?
Also, is it possible to do faster in any other language (for example c)? I might switch to that language.
Thanks.
Related Post:
CodePudding user response:
Try:
s = '0011'
print(np.frombuffer(s.encode("ascii"), dtype="u1") - 48)
Benchmark:
import numpy as np
from timeit import timeit
s = "1011" * 256 # length = 1024
def f1():
return np.frombuffer(s.encode("ascii"), dtype="u1") - 48
def f2():
return np.array([int(c) for c in s])
def f3():
return list(map(int, s))
def f4():
return [int(c) for c in s]
t1 = timeit(f1, number=1_000)
t2 = timeit(f2, number=1_000)
t3 = timeit(f3, number=1_000)
t4 = timeit(f4, number=1_000)
print(t1)
print(t2)
print(t3)
print(t4)
Prints:
0.00223864201689139
0.18963027599966154
0.10751374304527417
0.13433810899732634
EDIT: added functions which creates only python list (instead of np.array)
CodePudding user response:
bytearray
appears to be even faster than Andrej's NumPy solution. And bytes
can be used for a fast list
solution. Times with 1024 bits (only showing the first 5):
f1 2.7 ms [1 0 1 1 1]
f2 2.0 ms bytearray(b'\x01\x00\x01\x01\x01')
f3 7.6 ms [1, 0, 1, 1, 1]
Code based on Andrej's (Try it online!):
import numpy as np
from timeit import timeit
s = "1011" * 256 # length = 1024
def f1():
return np.frombuffer(s.encode("ascii"), dtype="u1") - 48
table = bytearray.maketrans(b'01', b'\x00\x01')
def f2():
return bytearray(s, "ascii").translate(table)
def f3():
return [*s.encode().translate(table)]
for _ in range(3):
for f in f1, f2, f3:
t = timeit(f, number=1_000)
t = '%5.1f ms ' % (t * 1e3)
print(f.__name__, t, f()[:5])
print()