I have the current function, seen below, to calculate the IoU (intersection over union) of two boxes. I understand the code, until "lt" and "rb" are calculated. I have no idea what this part of the code means. Can someone please help me? I've also calculated my IoU from another post/website (here), and it gives a different result to what I'm trying to achieve as it seems to miss some items. More info found here, where I've posed my original question and my code/purpose is visible.
def box_iou_calc(boxes1, boxes2):
# https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py
"""
Return intersection-over-union (Jaccard index) of boxes.
Both sets of boxes are expected to be in (x1, y1, x2, y2) format.
Arguments:
boxes1 (Array[N, 4])
boxes2 (Array[M, 4])
Returns:
iou (Array[N, M]): the NxM matrix containing the pairwise
IoU values for every element in boxes1 and boxes2
This implementation is taken from the above link and changed so that it only uses numpy..
"""
def box_area(box):
# box = 4xn
return (box[2] - box[0]) * (box[3] - box[1])
area1 = box_area(boxes1.T)
area2 = box_area(boxes2.T)
lt = np.maximum(boxes1[:, None, :2], boxes2[:, :2]) # [N,M,2]
rb = np.minimum(boxes1[:, None, 2:], boxes2[:, 2:]) # [N,M,2]
inter = np.prod(np.clip(rb - lt, a_min=0, a_max=None), 2)
return inter / (area1[:, None] area2 - inter) # iou = inter / (area1 area2 - inter)
CodePudding user response:
It says the arrays are:
boxes1 (Array[N, 4])
boxes2 (Array[M, 4])
With a bit of numpy docs reading, and experimentation, it should become obvious that:
boxes1[:, None, :2]
takes 2 columns from boxes1
, and adds a size 1 dimension. The resulting shape will be (N,1,2). In this context None
is the same as np.newaxis
.
boxes2[:, :2]
is simpler, it just returns a (M,2) shape.
np.maximum
uses the rules of broadcasting to combine the 2 arrays:
(N,1,2) with (M,2) => (N,1,2) with (1,M,2) => (N,M,2)
as commented. You can think of this as performing a kind of outer
maximum of the 2 arrays, comparing each of the M columns of array with the N columns of the other.
I assume lt
and rb
represent left
and right
boundaries of the union of these sets of boxes.
When deciphering numpy
code it's a good idea to have both the numpy docs at hand, and an interactive session where you can test bits of code.
(I normally would illustrate this with a small example of my own, but my current computer setup doesn't let me do that.)