np.array[:,None,:2] confusion, IoU calculation issue-CodePudding

I have the current function, seen below, to calculate the IoU (intersection over union) of two boxes. I understand the code, until "lt" and "rb" are calculated. I have no idea what this part of the code means. Can someone please help me? I've also calculated my IoU from another post/website (here), and it gives a different result to what I'm trying to achieve as it seems to miss some items. More info found here, where I've posed my original question and my code/purpose is visible.

def box_iou_calc(boxes1, boxes2):
    # https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py
    """
    Return intersection-over-union (Jaccard index) of boxes.
    Both sets of boxes are expected to be in (x1, y1, x2, y2) format.
    Arguments:
        boxes1 (Array[N, 4])
        boxes2 (Array[M, 4])
    Returns:
        iou (Array[N, M]): the NxM matrix containing the pairwise
            IoU values for every element in boxes1 and boxes2

    This implementation is taken from the above link and changed so that it only uses numpy..
    """

    def box_area(box):
        # box = 4xn
        return (box[2] - box[0]) * (box[3] - box[1])

    area1 = box_area(boxes1.T)
    area2 = box_area(boxes2.T)

    lt = np.maximum(boxes1[:, None, :2], boxes2[:, :2])  # [N,M,2]
    rb = np.minimum(boxes1[:, None, 2:], boxes2[:, 2:])  # [N,M,2]

    inter = np.prod(np.clip(rb - lt, a_min=0, a_max=None), 2)
    return inter / (area1[:, None]   area2 - inter)  # iou = inter / (area1   area2 - inter)

CodePudding user response：

It says the arrays are:

    boxes1 (Array[N, 4])
    boxes2 (Array[M, 4])

With a bit of numpy docs reading, and experimentation, it should become obvious that:

boxes1[:, None, :2]

takes 2 columns from boxes1, and adds a size 1 dimension. The resulting shape will be (N,1,2). In this context None is the same as np.newaxis.

boxes2[:, :2]

is simpler, it just returns a (M,2) shape.

np.maximum uses the rules of broadcasting to combine the 2 arrays:

(N,1,2) with (M,2) => (N,1,2) with (1,M,2) => (N,M,2)

as commented. You can think of this as performing a kind of outer maximum of the 2 arrays, comparing each of the M columns of array with the N columns of the other.

I assume lt and rb represent left and right boundaries of the union of these sets of boxes.

When deciphering numpy code it's a good idea to have both the numpy docs at hand, and an interactive session where you can test bits of code.

(I normally would illustrate this with a small example of my own, but my current computer setup doesn't let me do that.)