ArUco Markers, pose estimatimation- Exactly for which point the traslation and rotation is given?-CodePudding

I Detected the ArUco marker and estimated the pose. See the image below. However, Xt (X translation) I get is a positive value. According to the drawAxis function, the positive direction is away from the image center. So I thought it was supposed to be a negative value. Why I am getting positive instead.

My camera is about 120 mm away from the imaging surface. But I am getting Zt (Z translation) in the range of 650 mm. Is pose estimation giving the pose of marker with respect to physical camera or image plane center? I didn't get why the Zt is so high.

I kept measuring Pose while changing Z, and obtained roll, pitch, yaw. I noticed roll ( rotation w.r.t. cam X-axis) is changing its sign back and forth magnitude change 166-178, but the sign of Xt did not change with the sign change in roll. Any thoughts on why it behaves like that? Any suggestion to get more consistent data?

image=cv.imread(fname)
arucoDict = cv.aruco.Dictionary_get(cv.aruco.DICT_4X4_1000)
arucoParams = cv.aruco.DetectorParameters_create()
(corners, ids, rejected) = cv.aruco.detectMarkers(image, arucoDict,
    parameters=arucoParams)

    print(corners, ids, rejected)
    
    if len(corners) > 0:
        # flatten the ArUco IDs list
        ids = ids.flatten()
        # loop over the detected ArUCo corners
        #for (markerCorner, markerID) in zip(corners, ids):
        #(markerCorner, markerID)=(corners, ids)
            # extract the marker corners (which are always returned in
            # top-left, top-right, bottom-right, and bottom-left order)
        #corners = corners.reshape((4, 2))
        (topLeft, topRight, bottomRight, bottomLeft) = corners[0][0][0],corners[0][0][1],corners[0][0][2],corners[0][0][3]
            # convert each of the (x, y)-coordinate pairs to integers
        topRight = (int(topRight[0]), int(topRight[1]))
        bottomRight = (int(bottomRight[0]), int(bottomRight[1]))
        bottomLeft = (int(bottomLeft[0]), int(bottomLeft[1]))
        topLeft = (int(topLeft[0]), int(topLeft[1]))
    
            # draw the bounding box of the ArUCo detection
        cv.line(image, topLeft, topRight, (0, 255, 0), 2)
        cv.line(image, topRight, bottomRight, (0, 255, 0), 2)
        cv.line(image, bottomRight, bottomLeft, (0, 255, 0), 2)
        cv.line(image, bottomLeft, topLeft, (0, 255, 0), 2)
            # compute and draw the center (x, y)-coordinates of the ArUco
            # marker
        cX = int((topLeft[0]   bottomRight[0]) / 2.0)
        cY = int((topLeft[1]   bottomRight[1]) / 2.0)
        cv.circle(image, (cX, cY), 4, (0, 0, 255), -1)
        
        if topLeft[1]!=topRight[1] or topLeft[0]!=bottomLeft[0]:
            rot1=np.degrees(np.arctan((topLeft[0]-bottomLeft[0])/(bottomLeft[1]-topLeft[1])))
            rot2=np.degrees(np.arctan((topRight[1]-topLeft[1])/(topRight[0]-topLeft[0])))
            rot=(np.round(rot1,3) np.round(rot2,3))/2
            print(rot1,rot2,rot)
        else:
            rot=0
    
        # draw the ArUco marker ID on the image
        rotS=",rotation:" str(np.round(rot,3))
        cv.putText(image, ("position: " str(cX)  "," str(cY)),
        (100, topLeft[1] - 15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
        cv.putText(image, rotS,
        (400, topLeft[1] -15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
        print("[INFO] ArUco marker ID: {}".format(ids))
        
        
        d=np.round((math.dist(topLeft,bottomRight) math.dist(topRight,bottomLeft))/2,3)
        # Get the rotation and translation vectors
        rvecs, tvecs, obj_points = cv.aruco.estimatePoseSingleMarkers(corners,aruco_marker_side_length,mtx,dst)
        
         
        # Print the pose for the ArUco marker
        # The pose of the marker is with respect to the camera lens frame.
        # Imagine you are looking through the camera viewfinder, 
        # the camera lens frame's:
        # x-axis points to the right
        # y-axis points straight down towards your toes
        # z-axis points straight ahead away from your eye, out of the camera
        #for i, marker_id in enumerate(marker_ids):
         
            # Store the translation (i.e. position) information
        transform_translation_x = tvecs[0][0][0]
        transform_translation_y = tvecs[0][0][1]
        transform_translation_z = tvecs[0][0][2]
               
            # Store the rotation information
        rotation_matrix = np.eye(4)
        rotation_matrix[0:3, 0:3] = cv.Rodrigues(np.array(rvecs[0]))[0]
        r = R.from_matrix(rotation_matrix[0:3, 0:3])
        quat = r.as_quat()   
             
            # Quaternion format     
        transform_rotation_x = quat[0] 
        transform_rotation_y = quat[1] 
        transform_rotation_z = quat[2] 
        transform_rotation_w = quat[3] 
             
            # Euler angle format in radians
        roll_x, pitch_y, yaw_z = euler_from_quaternion(transform_rotation_x,transform_rotation_y,transform_rotation_z,transform_rotation_w)
             
        roll_x = math.degrees(roll_x)
        pitch_y = math.degrees(pitch_y)
        yaw_z = math.degrees(yaw_z)

CodePudding user response：

Without checking all the code (looks roughly okay), a few basics about OpenCV and aruco:

Both use right-handed coordinate systems. Thumb X, index Y, middle Z.

OpenCV uses X right, Y down, Z far, for screen/camera frames. Origin for screens and pictures is the top left corner. For cameras, the origin is the center of the pinhole model, which would be the center of the aperture. I can't comment on lenses or lens systems. Assume the lens center is the origin. That's probably close enough.

Aruco uses X right, Y far, Z up, if the marker is lying flat on a table. Origin is in the center of the marker. The top left corner of the marker is considered the "first" corner.

The marker can be considered to have its own coordinate system/frame.

The pose given by rvec and tvec is the pose of the marker in the camera frame. That means np.linalg.norm(tvec) gives you the direct distance from the camera to the marker's center. tvec's Z is just the component parallel to optical axis.

If the marker is in the right half of the picture ("half" defined by camera matrix's cx,cy), you'd expect tvec's X to grow. Lower half, Y positive/growing.

Conversely, that transformation transforms marker-local coordinates to camera-local. Try transforming some marker-local points, such as origin or points on the axes. I believe that cv::transform can help with that. Using OpenCV's projectPoints to map 3D space points to 2D image points, you can then draw the marker's axes, or a cube on top of it, or anything you like.

Say the marker sits upright and faces the camera dead-on. When you consider the frame triads of the marker and the camera in space ("world" space), both would be X "right", but one's Y and Z are opposite the other's Y and Z, so you'd expect to see a rotation around the X axis by half a turn (rotating Z and Y).

You could imagine the transformation to happen like this:

initially the camera looks through the marker, from the marker's back out into the world. The camera would be "upside down". The camera sees marker-space.
the pose's rotation component rotates the whole marker-local world around the camera's origin. Seen from the world frame (point of reference), the camera rotates, into an attitude you'd find natural.
the pose's translation moves the marker's world out in front of the camera (Z being positive), or equivalently, the camera backs away from the marker.

If you get implausible values, check aruco_marker_side_length and camera matrix. f would be around 500-3000 for typical resolutions (VGA-4k) and fields of view (60-80 degrees).