Home > Back-end >  How can I apply a function element-wise to two arrays?
How can I apply a function element-wise to two arrays?

Time:09-20

I have two numpy 1D arrays of strings and a function that takes two strings and generates an score based on some relations between the two input strings.

def get_score(string1, string2):
    # compute score ...
    return score

Is there an efficient way (perhaps using numpy) to apply that function to all combinations of the two arrays to generate an array with the scores from which I could select the max score?

CodePudding user response:

With a large set of operators and ufunc, numpy can easily do this kind of element-wise computation, using a fundamental concept of broadcasting:

In [155]: A = np.array(['one','two','three']); B = np.array(['four','two'])

In [156]: A[:,None] == B      # compare a (3,1) array with a (2,)
Out[156]: 
array([[False, False],
       [False,  True],
       [False, False]])

But this works much better with numeric arrays. There aren't many actions that work with string arrays.

A few of the np.char functions work with 2 arrays:

In [159]: np.char.join(B,A[:,None])
Out[159]: 
array([['ofournfoure', 'otwontwoe'],
       ['tfourwfouro', 'ttwowtwoo'],
       ['tfourhfourrfourefoure', 'ttwohtwortwoetwoe']], dtype='<U21')

Expanding the arrays into 2d arrays (functionally the same as A[:,None]):

In [160]: np.meshgrid(A,B,indexing='ij')
Out[160]: 
[array([['one', 'one'],
        ['two', 'two'],
        ['three', 'three']], dtype='<U5'),
 array([['four', 'two'],
        ['four', 'two'],
        ['four', 'two']], dtype='<U4')]

np.vectorize can be used to apply broadcasting to a function that takes scalar inputs (single strings). For small arrays it tends to be slower than list comprehension, but for large arrays it scales somewhat better.

In short, there's a lot of power in numpy for doing numeric element-wise operations, less so for strings.

CodePudding user response:

You would have to iterate, for instance, if pairscore computes the score of 2 elements:

def get_score(string1, string2):
    return max([pairscore(x1, x2) for x1 in string1 for x2 in string2])
  • Related