I have two numpy 1D arrays of strings and a function that takes two strings and generates an score based on some relations between the two input strings.
def get_score(string1, string2):
# compute score ...
return score
Is there an efficient way (perhaps using numpy) to apply that function to all combinations of the two arrays to generate an array with the scores from which I could select the max score?
CodePudding user response:
With a large set of operators and ufunc, numpy can easily do this kind of element-wise computation, using a fundamental concept of broadcasting
:
In [155]: A = np.array(['one','two','three']); B = np.array(['four','two'])
In [156]: A[:,None] == B # compare a (3,1) array with a (2,)
Out[156]:
array([[False, False],
[False, True],
[False, False]])
But this works much better with numeric arrays. There aren't many actions that work with string arrays.
A few of the np.char
functions work with 2 arrays:
In [159]: np.char.join(B,A[:,None])
Out[159]:
array([['ofournfoure', 'otwontwoe'],
['tfourwfouro', 'ttwowtwoo'],
['tfourhfourrfourefoure', 'ttwohtwortwoetwoe']], dtype='<U21')
Expanding the arrays into 2d arrays (functionally the same as A[:,None]
):
In [160]: np.meshgrid(A,B,indexing='ij')
Out[160]:
[array([['one', 'one'],
['two', 'two'],
['three', 'three']], dtype='<U5'),
array([['four', 'two'],
['four', 'two'],
['four', 'two']], dtype='<U4')]
np.vectorize
can be used to apply broadcasting to a function that takes scalar inputs (single strings). For small arrays it tends to be slower than list comprehension, but for large arrays it scales somewhat better.
In short, there's a lot of power in numpy
for doing numeric element-wise operations, less so for strings.
CodePudding user response:
You would have to iterate, for instance, if pairscore
computes the score of 2 elements:
def get_score(string1, string2):
return max([pairscore(x1, x2) for x1 in string1 for x2 in string2])