Home > Software design >  Numpy-ish way to count intersection size of two uneven numpy arrays with weights
Numpy-ish way to count intersection size of two uneven numpy arrays with weights

Time:08-27

I am looking for a numpy-ish way to count intersection size of two uneven numpy arrays with weights. I have two numpy arrays of ints of the form:

arr1=[[item11,count11],[item12,count12],[item13,count13],...]
arr2=[[item21,count21],[item22,count22],[item23,count23],...]

Let's say that these arrays summarize people's grocery lists, where each element is of the form [itemX, countY] and denotes that a person bought countY copies of itemX. The arrays are of different lengths and unsorted, because different people might buy different items and items that someone didn't buy are not on their grocery list.

I'd like to count the number of items that appear in both arr1 and arr2, weighting them by the minimum count. For example [item1,count1] is in arr1 and [item1, count2] is in arr2, I want to add min(count1,count2) to the sum total.

A non-numpy code for this would be:

count = 0
for i in range(len(arr1)):
  for j in range(len(arr2)):
   if arr1[i][0] == arr2[j][0]:
     count  = min(arr1[i][1], arr2[j][1])
return count

Example:

arr1 = [[1,10],[2,100],[3,1000],[4,10000]]
arr2 = [[1,10],[3,100],[4,1000],[5,10000],[6,99]]

Should return 1110, because 1 appears in both lists 10 times, 3 appears in both lists 100 times, and 4 appears in both lists 1000 times.

Thanks for your help!

CodePudding user response:

Using list comprehension:

a = dict(arr1)
b = dict(arr2)
sum([min([a[i], b[i]]) for i in set(a).intersection(b)])
1110

Using numpy:

import numpy as np

np1 = np.array(arr1).T
np2 = np.array(arr2).T
both = np.intersect1d(np1[0], np2[0])
np.minimum(np1[1, np.in1d(np1[0], both)], np2[1, np.in1d(np2[0], both)]).sum()
1110

Using pandas

import pandas as pd

pd.DataFrame(arr1).merge(pd.DataFrame(arr2),on=0).set_index(0).min(1).sum()
1110
  • Related