Home > Software engineering >  Recreating Global 3D Points, from Local 3D Points and Global 2D Points; SolvePnP
Recreating Global 3D Points, from Local 3D Points and Global 2D Points; SolvePnP

Time:03-11

Aloha, i have a list of 2D Keypoints which are located in the global scope/frame (image points), and a list of corresponding 3D Keypoints in the local scope (often called texture or object points). The image points are ranging from x[0-1920]y[0,1080] and the object points are withing the range of x[-1,1]y[-1,1]. I have followed the approach described in this paper on page 6 with the tutorial from here, but the output of my 3D points is not correct at all, the movement of the points is all over the place. Below is my approach using SolvePnP. Am I on the wrong track here, since SolvePnP is normally used for detecting the camera movement (open for other suggestions!) or is my method wrong?

import numpy as np
import cv2
array = np.array # convenience

frame1_2d = \
array([[1033.9708251953125 ,  344.23065185546875],
       [1077.796630859375  ,  617.1146240234375 ],
       [ 958.2716674804688 ,  609.1179809570312 ],
       [1074.8084716796875 ,  782.0444946289062 ],
       [ 975.2044067382812 ,  418.1991882324219 ],
       [1024.0103759765625 ,  931.980712890625  ],
       [1122.6185302734375 ,  605.1196899414062 ],
       [1096.721435546875  ,  418.1991882324219 ],
       [ 999.109375        ,  617.1146240234375 ],
       [ 962.255859375     ,  518.1566772460938 ],
       [1111.662109375     ,  517.1571044921875 ],
       [1014.0499877929688 ,  782.0444946289062 ],
       [1061.8599853515625 ,  930.9811401367188 ]])
frame1_3d = \
array([[-0.01265097688883543   , -0.4992150068283081    , -0.11455678939819336   ],
       [ 0.10584918409585953   , -0.0018199272453784943 ,  0.0023642126470804214 ],
       [-0.14271944761276245   ,  0.06332945823669434   ,  0.1438678503036499    ],
       [ 0.09254898130893707   ,  0.3176574409008026    , -0.17930322885513306   ],
       [-0.1155640035867691    , -0.4058316648006439    ,  0.00021289288997650146],
       [-0.03301446512341499   ,  0.6519031524658203    , -0.3515356183052063    ],
       [ 0.14540529251098633   ,  0.05645819008350372   ,  0.10776595026254654   ],
       [ 0.10836226493120193   , -0.4078497290611267    ,  0.000870194286108017  ],
       [-0.10584865510463715   ,  0.001818838994950056  , -0.0023612845689058304 ],
       [-0.1546039581298828    , -0.17418316006660461   ,  0.10266228020191193   ],
       [ 0.1590884029865265    , -0.17913128435611725   ,  0.09423552453517914   ],
       [-0.0736076831817627    ,  0.3179360628128052    , -0.17892584204673767   ],
       [ 0.05236409604549408   ,  0.6490492820739746    , -0.33908188343048096   ]])

frame2_2d = \
array([[1028.110107421875  ,  327.7352600097656 ],
       [1068.0904541015625 ,  606.7128295898438 ],
       [ 982.1328125       ,  229.74314880371094],
       [1071.0889892578125 ,  778.698974609375  ],
       [ 979.13427734375   ,  403.7291564941406 ],
       [1013.1174926757812 ,  933.6865234375    ],
       [1069.0899658203125 ,  243.7420196533203 ],
       [1080.08447265625   ,  403.7291564941406 ],
       [ 997.1254272460938 ,  616.7119750976562 ],
       [ 983.13232421875   ,  312.7364501953125 ],
       [1071.0889892578125 ,  317.7360534667969 ],
       [1005.1214599609375 ,  778.698974609375  ],
       [1061.0938720703125 ,  936.686279296875  ]])

frame2_3d = \
array([[-0.0004756036214530468, -0.5245562791824341   , -0.010652128607034683 ],
       [ 0.10553547739982605  , -0.00272204983048141  ,  0.0024587283842265606],
       [-0.1196068525314331   , -0.6828885078430176   , -0.14210689067840576  ],
       [ 0.0845363438129425   ,  0.38039350509643555  , -0.028144780546426773 ],
       [-0.11286421865224838  , -0.4302292466163635   ,  0.06919233500957489  ],
       [-0.030065223574638367 ,  0.754790186882019    ,  0.012936152517795563 ],
       [ 0.1010960042476654   , -0.6289429664611816   , -0.11814753711223602  ],
       [ 0.1058841198682785   , -0.4253752827644348   ,  0.08086629956960678  ],
       [-0.10553570091724396  ,  0.002716599963605404 , -0.0024500866420567036],
       [-0.127223938703537    , -0.5319695472717285   , -0.09722068160772324  ],
       [ 0.11508879065513611  , -0.49151480197906494  , -0.07002018392086029  ],
       [-0.06679684668779373  ,  0.38714516162872314  , -0.023669833317399025 ],
       [ 0.05081187188625336  ,  0.7544023990631104   , -0.011078894138336182 ]])

frame3_2d = \
array([[1027.91845703125   ,  338.2441711425781 ],
       [1067.8787841796875 ,  612.0115356445312 ],
       [ 803.141357421875  ,  500.10662841796875],
       [1070.8758544921875 ,  776.8713989257812 ],
       [ 968.9768676757812 ,  413.18048095703125],
       [1012.9332885742188 ,  925.7449340820312 ],
       [1248.699462890625  ,  491.1142578125    ],
       [1089.8570556640625 ,  412.18133544921875],
       [ 995.9501342773438 ,  611.0123901367188 ],
       [ 871.073974609375  ,  461.1397399902344 ],
       [1181.765869140625  ,  454.14569091796875],
       [1003.9421997070312 ,  775.8722534179688 ],
       [1061.884765625     ,  933.7380981445312 ]])

frame3_3d = \
array([[-0.003511453978717327  , -0.5015891194343567    , -0.10520103573799133   ],
       [ 0.10480749607086182   , -0.00019206921570003033, -0.0004397481679916382 ],
       [-0.47764456272125244   , -0.1816674768924713    ,  0.04093759506940842   ],
       [ 0.0936243087053299    ,  0.3628539443016052    , -0.09391097724437714   ],
       [-0.11445926129817963   , -0.41107428073883057   ,  0.01644478738307953   ],
       [-0.03567686676979065   ,  0.720417320728302     , -0.10493464022874832   ],
       [ 0.4529808759689331    , -0.18383921682834625   , -0.02210136130452156   ],
       [ 0.1092790886759758    , -0.41095152497291565   ,  0.011709243059158325  ],
       [-0.10480757057666779   ,  0.00018716813065111637,  0.0004445519298315048 ],
       [-0.3031604290008545    , -0.2810187041759491    ,  0.07747684419155121   ],
       [ 0.3006024956703186    , -0.28319910168647766   ,  0.043038371950387955  ],
       [-0.07087739557027817   ,  0.35837966203689575   , -0.08430898934602737   ],
       [ 0.062416717410087585  ,  0.7248380780220032    , -0.13536334037780762   ]])

#frame1_2d = np.asarray(frame1_2d, dtype=float)
#frame1_3d = np.asarray(frame1_3d, dtype=float)
#frame2_2d = np.asarray(frame2_2d, dtype=float)
#frame2_3d = np.asarray(frame2_3d, dtype=float)
#frame3_2d = np.asarray(frame3_2d, dtype=float)
#frame3_3d = np.asarray(frame3_3d, dtype=float)

# Globalize 3D Points
dist_coeffs = (0.11480806073904032, -0.21946985653851792, 0.0012002116999769957, 0.008564577708855225, 0.11274677130853494)
camera_matrix = np.asarray([
    [1394.6027293299926, 0.0, 995.588675691456],
    [0.0, 1394.6027293299926, 599.3212928484164],
    [0.0, 0.0, 1]
])


# create rotation matrix of points
(success, rotation_vector, translation_vector) = cv2.solvePnP(frame3_3d, frame3_2d, camera_matrix, dist_coeffs, flags=0)
r_matrix = cv2.Rodrigues(rotation_vector)
rotation_matrix = np.zeros((4, 4))
rotation_matrix[:3, :3], _ = cv2.Rodrigues(rotation_vector)
rotation_matrix[:3, 3] = np.transpose(translation_vector)
rotation_matrix[3, 3] = 1

# apply rotation matrix to points
globalized_3d = np.c_[frame1_3d, np.ones((13, 1))]
for j in range(13):
    globalized_3d[j, :] = np.dot(rotation_matrix, globalized_3d[j, :])
print(globalized_3d)

Thanks in advance, appreciate any help! Edit: Included some examples in my code, after improving the things suggested by top answer

CodePudding user response:

  1. yes, solvePnP is okay to use
  2. yes, your math is wrong

I'll assume that you get your points from a face landmark detector, so they have a fixed order. I'll also assume that your 3D model points are given in the same order and their values are consistent and somewhat similar to the face you look at. You should exclude points that denote flesh and mandible (as opposed to skull bone). You actually want to track the skull, not the position of lips and jaws that move all over the place.

rvec is an axis-angle encoding. Its length is the amount of rotation (expected between 0 and 3.14=pi) and its direction is the axis of rotation.

Use cv.Rodrigues to turn the rvec into a 3x3 rotation matrix.

In fact, just build yourself some functions that take rvec and tvec and build a 4x4 matrix. Extending all points to be (x,y,z,1) is a hassle but only once.

And make sure you use @ for matrix multiplication (or np.dot, np.matmul, ...) because * is element-wise multiplication.

  • Related