I am looking for a numpy-ish way to count intersection size of two uneven numpy arrays with weights. I have two numpy arrays of ints of the form:
arr1=[[item11,count11],[item12,count12],[item13,count13],...]
arr2=[[item21,count21],[item22,count22],[item23,count23],...]
Let's say that these arrays summarize people's grocery lists, where each element is of the form [itemX, countY] and denotes that a person bought countY copies of itemX. The arrays are of different lengths and unsorted, because different people might buy different items and items that someone didn't buy are not on their grocery list.
I'd like to count the number of items that appear in both arr1 and arr2, weighting them by the minimum count. For example [item1,count1] is in arr1 and [item1, count2] is in arr2, I want to add min(count1,count2) to the sum total.
A non-numpy code for this would be:
count = 0
for i in range(len(arr1)):
for j in range(len(arr2)):
if arr1[i][0] == arr2[j][0]:
count = min(arr1[i][1], arr2[j][1])
return count
Example:
arr1 = [[1,10],[2,100],[3,1000],[4,10000]]
arr2 = [[1,10],[3,100],[4,1000],[5,10000],[6,99]]
Should return 1110, because 1 appears in both lists 10 times, 3 appears in both lists 100 times, and 4 appears in both lists 1000 times.
Thanks for your help!
CodePudding user response:
Using list comprehension:
a = dict(arr1)
b = dict(arr2)
sum([min([a[i], b[i]]) for i in set(a).intersection(b)])
1110
Using numpy:
import numpy as np
np1 = np.array(arr1).T
np2 = np.array(arr2).T
both = np.intersect1d(np1[0], np2[0])
np.minimum(np1[1, np.in1d(np1[0], both)], np2[1, np.in1d(np2[0], both)]).sum()
1110
Using pandas
import pandas as pd
pd.DataFrame(arr1).merge(pd.DataFrame(arr2),on=0).set_index(0).min(1).sum()
1110