I'm having trouble trying to count the number of leading zero bits after an sha256 hash function as I don't have a lot of experience on 'low level' stuff in python

hex = hashlib.sha256((some_data_from_file).encode('ascii')).hexdigest()
# hex = 0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f
# this has 20 leading zero bits
hex_digits = int.from_bytes(bytes(hex.encode('ascii')), 'big') #convert str to int

#count num of leading zeroes 
def countZeros(x):
   total_bits = 256
   res = 0
   while ((x & (1 << (total_bits - 1))) == 0):
       x = (x << 1)
       res  = 1
   return res
print(countZeroes(hex_digits)) # returns 2

I've also tried converting it using bin() however that didn't provide me with any leading zeros.

CodePudding user response：

.hexdigest() returns a string, so your hex variable is a string.

I'm going to call it h instead, because hex is a builtin python function.

So you have:

h = "0000094e7cc7303a3e33aaeaba76ad937603d4d040064f473a12f10ab30a879f"

Now this is a hexadecimal string. Each digit in a hexadecimal number gives you four bits in binary. Since this has five leading zeros, you already have 5 * 4 = 20 leading zeros.

nzeros = 0
for c in h:
    if c == "0": nzeros  = 4
    else: break

Then, you need to count the leading zeros in the binary representation of the first non-zero hexadecimal digit. This is easy to get: A number has math.floor(math.log2(number)) 1 binary digits, i.e. 4 - (math.floor(math.log2(number)) 1) leading zeros if it's a hexadecimal digit, since they can only have a max of 4 bits. In this case, the digit is a 9 (1001 in binary), so there are zero additional leading zeros.

So, modify the previous loop:

nzeros = 0
for c in h:
    if c == "0": nzeros  = 4
    else: 
        digit = int(c, base=16)
        nzeros  = 4 - (math.floor(math.log2(digit))   1)
        break

print(nzeros) # 20

CodePudding user response：

Danger!!!

Is this security-sensitive code? Can this hash ever be the result of hashing secret/private data?

If so, then you should make different choices.

If it's not security-sensitive, then:

If you just want a good balance of simplicity and low-effort:

def count_leading_zeroes(value, max_bits=256):
    value &= (1 << max_bits) - 1  # truncate; treat negatives as 2's compliment
    if value == 0:
        return max_bits
    significant_bits = len(bin(value)) - 2  # has "0b" prefix
    return max_bits - significant_bits

If you want to really embrace the bit twiddling you were trying in your question:

def count_leading_zeroes(value, max_bits=256):
   top_bit = 1 << (max_bits - 1)
   count = 0
   value &= (1 << max_bits) - 1
   while not value & top_bit:
       count  = 1
       value <<= 1
   return count

If you're doing manual bit twiddling, I think in this case a loop which counts from the top is the most justified option, because hashes are rather evenly randomly distributed and so each bit has about even chance of not being zero.

So you have a good chance of exiting the loop early if you start for from the top (if you start from the bottom you have to check every bit).