As part of a hand gesture recognition system based on OpenCV and MediaPipe (from Google) I investigated the frame rate possible. The code in approach 1 was used first (partly from a YouTube video and partly from Mediapipe example code) - this uses a frame counter and total time passed to calculate the frame rate. The code for approach 2 uses a slightly different method - the start and end times are used to determine the time for processing a frame to give the frame rate. The instantaneous frame rate is then added to a list which is averaged to give the average frame rate (over 10 frames).
Approach 1 on my system achieves a frame rate of 30fps (some settling time is needed before this value is reached - say 5-10 secs). Approach 2, when the if results.multi_hand_landmarks
line is false, achieves 100-110fps. When if results.multi_hand_landmarks
is true the frame rate drops to about 60fps (double approach 1).
The frame rate for Approach 1 doesn't vary based on if results.multi_hand_landmarks
but is reduced by increasing the value for cv2.waitKey(5)
. Approach 2 shows different behaviour when increasing the same value - as the value is increased the frame rate will increase to a point where a maximum frame rate is reached, but this drops off with any further increase in the wait time.
I suspect the correct frame rate is 30fps based on the camera's specs (see below), but that doesnt explain the values from Approach 2.
I have worked through both approaches to eliminate other sources that may affect the frame rate but it was soon obvious (to me at least) that it was the method used to calculate the frame rates. So I ask if anyone can shed some light on why the two approaches produce different results. To me the logic of each method seems valid, but I must be missing something.
Windows 10; Python - 3.7.9; OpenCV - 4.5.4; MediaPipe - 0.8.9; CPU driven (no GPU); Webcam - Logitech C920 (30fps @ 720p/1080p)
Approach 1
import cv2
import mediapipe as mp
import numpy as np
import sys
import time
def main():
cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)
# start processing the video feed
capture(cap)
def capture(cap):
mpHands = mp.solutions.hands
hands = mpHands.Hands(min_detection_confidence=0.91, min_tracking_confidence=0.91)
# used for displaying hand and other information in separate window
mpDraw = mp.solutions.drawing_utils
# used for displaying hand and other information in separate window
# initialize time and frame count variables
last_time = time.time()
frames = 0
while True:
# blocks until the entire frame is read
success, img = cap.read()
# used for displaying hand and other information in separate window
img = cv2.cvtColor(cv2.flip(img,1), cv2.COLOR_BGR2RGB)
# process the image
results = hands.process(img)
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
# if results and landmarks exist process as needed
if results.multi_hand_landmarks:
for handLms in results.multi_hand_landmarks:
# used for displaying hand and landmarks in separate window
mpDraw.draw_landmarks(img, handLms, mpHands.HAND_CONNECTIONS)
# Other code goes here to process landmarks
# used for displaying hand and other information in separate window
# compute fps: current_time - last_time
frames = 1
delta_time = time.time() - last_time
cur_fps = np.around(frames / delta_time, 1)
# used for displaying hand and other information in separate window
cv2.putText(img, 'FPS: ' str(cur_fps), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
cv2.imshow("Image", img)
if cv2.waitKey(5) & 0xFF == 27:
break
if __name__ == "__main__":
main()
Approach 2
import cv2
import mediapipe as mp
import numpy as np
import sys
import time
def main():
cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)
# start processing the video feed
capture(cap)
def capture(cap):
mpHands = mp.solutions.hands
hands = mpHands.Hands(min_detection_confidence=0.91, min_tracking_confidence=0.91)
# used for displaying hand and other information in separate window
mpDraw = mp.solutions.drawing_utils
# Initialise list to hold instantaneous frame rates
fps = [0]
while True:
# start time
start = time.time()
# blocks until the entire frame is read
success, image = cap.read()
# used for displaying hand and other information in separate window
imageRGB = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = hands.process(imageRGB)
# if results and landmarks exist process as needed
if results.multi_hand_landmarks:
for handLms in results.multi_hand_landmarks:
# used for displaying hand and landmarks in separate window
mpDraw.draw_landmarks(image, handLms, mpHands.HAND_CONNECTIONS)
# Other code goes here to process landmarks
# Define end time
end = time.time()
# Elapsed time between frames
elapsed_time = (end - start)
if elapsed_time != 0:
# Calculate the current instantaneous frame rate
cur_fps = 1/elapsed_time
# Append to end of list
fps.append(cur_fps)
# Maintain length of list
if len(fps) == 10:
del fps[0]
# Calculate the average frame rate
ave_fps = np.around(sum(fps)/len(fps))
# used for displaying hand and other information in separate window
cv2.putText(image, 'FPS: ' str(ave_fps), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
cv2.imshow("Image", image)
if cv2.waitKey(35) & 0xFF == 27:
break
cap.release()
if __name__ == "__main__":
main()
CodePudding user response:
Of course it has different results between #1 and #2
in #1, these lines are also counted to the delta_time
cur_fps = np.around(frames / delta_time, 1)
# used for displaying hand and other information in separate window
cv2.putText(img, 'FPS: ' str(cur_fps), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
cv2.imshow("Image", img)
if cv2.waitKey(5) & 0xFF == 27:
break
which make your FPS is more valid because it covers the whole loop time
in #2, you're not fully counting the FPS because you're not including the above codelines to be counted