Home > Software engineering >  Make numpy array which is hash of string array
Make numpy array which is hash of string array

Time:10-25

I have numpy array:

A = np.array(['abcd','bcde','cdef'])

I need hash array of A: with function

B[i] = ord(A[i][1]) * 256   ord(A[i][2])

B = np.array([ord('b') * 256   ord('c'), ord('c') * 256   ord('d'), ord('d') * 256   ord('e')])

How I can do it?

CodePudding user response:

Based on the question, I assume the string are ASCII one and all strings have a size bigger than 3 characters.

You can start by converting strings to ASCII one for sake of performance and simplicity (by creating a new temporary array). Then you can merge all the string in one big array without any copy thanks to views (since Numpy strings are contiguously stored in memory) and you can actually convert characters to integers at the same time (still without any copy). Then you can use the stride so to compute all the hash in a vectorized way. Here is how:

ascii = A.astype('S')
buff = ascii.view(np.uint8)
result = buff[1::ascii.itemsize]*256   buff[2::ascii.itemsize]

CodePudding user response:

Congratulation! Speed increase four times!

import time
import numpy as np
Iter = 1000000
A = np.array(['abcd','bcde','cdef','defg'] * Iter)

Ti = time.time()
B = np.zeros(A.size)
for i in range(A.size):
    B[i] = ord(A[i][1]) * 256   ord(A[i][2])
DT1 = time.time() - Ti    

Ti = time.time()
ascii = A.astype('S') 
buff = ascii.view(np.uint8)
result = buff[1::ascii.itemsize]*256   buff[2::ascii.itemsize]
DT2 = time.time() - Ti

print("Equal = %s" % np.array_equal(B, result))
print("DT1=%7.2f Sec, DT2=%7.2f Sec, DT1/DT2=%6.2f" % (DT1, DT2, DT1/DT2))

Output:

Equal = True

DT1= 3.37 Sec, DT2= 0.82 Sec, DT1/DT2= 4.11

  • Related