Home > Blockchain >  Why is using integers much slower than using strings in this problem dealing with binary strings?
Why is using integers much slower than using strings in this problem dealing with binary strings?

Time:03-14

I solved the following leet code problem https://leetcode.com/problems/find-kth-bit-in-nth-binary-string/ by using strings (class Solution) and integers (class Solution2). I thought the integer solution should be a lot faster and using much less memory. But actually it takes a lot longer. With n=18 and k=200 it took 10.1 seconds compared to 0.13 seconds of the string solution.

import time

class Solution:
    def findKthBit(self, n: int, k: int) -> str:
        i = 0
        Sn = "0"
        Sn = self.findR(Sn, i, n)
        return Sn[k-1]

    def findR(self, Sn, i, n):
        if i == n:
            return Sn
        newSn = self.calcNewSn(Sn, i)
        return self.findR(newSn, i 1, n)

    def calcNewSn(self, Sn, i):
        inverted = ""
        for c in Sn:
            inverted  = "1" if c == "0" else "0"
        newSn = Sn   "1"   (inverted)[::-1]
        return newSn


class Solution2:
    def findKthBitBin(self, n: int, k: int) -> str:
        i = 0
        # MSB (S1) has to be 1 for bit operations but is actually always 0
        Sn = 1
        Sn = self.findRBin(Sn, i, n)
        lenSn = (2**(n 1)) - 1
        return "0" if k == 1 else str((Sn >> (lenSn - k)) & 1)

    def findRBin(self, Sn, i, n):
        if i == n:
            return Sn
        newSn = self.calcNewSnBin(Sn, i)
        return self.findRBin(newSn, i 1, n)

    def calcNewSnBin(self, Sn, i):
        lenSn = 2**(i 1) - 1
        newSn = (Sn << 1) | 1
        inverted = (~Sn | (1 << (lenSn - 1))) & self.getMask(lenSn)
        newSn = self.reverseBits(newSn, inverted, lenSn)
        return newSn

    def getMask(self, lenSn):
        mask = 0
        for i in range(lenSn):
            mask |= (1 << i)
        return mask

    def reverseBits(self, newSn, inverted, lenSn):
        for i in range(lenSn):
            newSn <<= 1
            newSn |= inverted & 1
            inverted >>= 1
        return newSn
 
sol = Solution()
sol2 = Solution2()

start = time.time()
print(sol.findKthBit(18, 200))
end = time.time()
print(f"time using strings: {end - start}")
start = time.time()
print(sol2.findKthBitBin(18, 200))
end = time.time()
print(f"time using large integers: {end - start}")

Why is the solution using integers so slow? Are the implemetations of big ints faster in other languages compared to strings?

CodePudding user response:

The problem isn't how fast or slow the integer implementation is. The problem is that you've written an algorithm that wants a random-access mutable sequence data type, like a list, and integers are not a random-access mutable sequence.

For example, look at your reverseBits implementation:

def reverseBits(self, newSn, inverted, lenSn):
    for i in range(lenSn):
        newSn <<= 1
        newSn |= inverted & 1
        inverted >>= 1
    return newSn

With a list, you could just call list.reverse(). Instead, your code shifts newSn and inverted over and over, copying almost all the data involved repeatedly on every iteration.

Strings aren't a random-access mutable sequence data type either, but they're closer, and the standard implementation of Python has a weird optimization that actually does try to mutate strings in loops like the one you've written:

def calcNewSn(self, Sn, i):
    inverted = ""
    for c in Sn:
        inverted  = "1" if c == "0" else "0"
    newSn = Sn   "1"   (inverted)[::-1]
    return newSn

Instead of building a new string and copying all the data for the =, Python tries to resize the string in place. When that works, it avoids most of the unnecessary copying you're doing with the integer-based implementation.


The right way to write your algorithm would be to use lists, instead of strings or integers. However, there's an even better way, with a different algorithm that doesn't need to build a bit sequence at all. You don't need the whole bit sequence. You just need to compute a single bit. Figure out how to compute just that bit.

CodePudding user response:

The bulk of the time is spent in reverseBits(). Overall, you're creating hundreds of thousands of new integer objects in that, up to 262143 bits long. It's essentially quadratic time. The corresponding operation for the string form is just (inverted)[::-1], which runs entirely at "C speed", and is worst-case linear time.

  • Related