Fastest Way to convert a Binary String to Binary Array (Array of 1 and 0)-CodePudding

I am trying to find the fastest possible way to convert a binary string to an array of 0 and 1. I am currently using python 3.8, and have the following two functions to obtain such array:

import numpy as np
from typing import Literal, Sequence
def string_to_array(Bin_String):
    Bin_array=[int(Bin_String[i],2) for i in range(len(Bin_String))]
    return Bin_array

def string_to_array_LtSq(string: Sequence[Literal['0', '1']]) -> np.ndarray:
    return np.array([int(c) for c in string])

For a string of length 1024, string_to_array_LtSq function takes 20 micro-seconds less than the other (average 370 micro-seconds) though I don't understand why it is faster since both are using int function.

But this is an important part of the code, so is there a faster way in python?

Also, is it possible to do faster in any other language (for example c)? I might switch to that language.

Thanks.

Convert Bitstring (String of 1 and 0s) to numpy array

CodePudding user response：

Try:

s = '0011'

print(np.frombuffer(s.encode("ascii"), dtype="u1") - 48)

Benchmark:

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


def f2():
    return np.array([int(c) for c in s])


def f3():
    return list(map(int, s))


def f4():
    return [int(c) for c in s]


t1 = timeit(f1, number=1_000)
t2 = timeit(f2, number=1_000)
t3 = timeit(f3, number=1_000)
t4 = timeit(f4, number=1_000)

print(t1)
print(t2)
print(t3)
print(t4)

Prints:

0.00223864201689139
0.18963027599966154
0.10751374304527417
0.13433810899732634

^{EDIT: added functions which creates only python list (instead of np.array)}

CodePudding user response：

bytearray appears to be even faster than Andrej's NumPy solution. And bytes can be used for a fast list solution. Times with 1024 bits (only showing the first 5):

f1   2.7 ms  [1 0 1 1 1]
f2   2.0 ms  bytearray(b'\x01\x00\x01\x01\x01')
f3   7.6 ms  [1, 0, 1, 1, 1]

Code based on Andrej's (Try it online!):

import numpy as np
from timeit import timeit

s = "1011" * 256  # length = 1024


def f1():
    return np.frombuffer(s.encode("ascii"), dtype="u1") - 48


table = bytearray.maketrans(b'01', b'\x00\x01')

def f2():
    return bytearray(s, "ascii").translate(table)


def f3():
    return [*s.encode().translate(table)]


for _ in range(3):
    for f in f1, f2, f3:
        t = timeit(f, number=1_000)
        t = '%5.1f ms ' % (t * 1e3)
        print(f.__name__, t, f()[:5])
    print()