Different frame rate calculation methods produce very different results-CodePudding

As part of a hand gesture recognition system based on OpenCV and MediaPipe (from Google) I investigated the frame rate possible. The code in approach 1 was used first (partly from a YouTube video and partly from Mediapipe example code) - this uses a frame counter and total time passed to calculate the frame rate. The code for approach 2 uses a slightly different method - the start and end times are used to determine the time for processing a frame to give the frame rate. The instantaneous frame rate is then added to a list which is averaged to give the average frame rate (over 10 frames).

Approach 1 on my system achieves a frame rate of 30fps (some settling time is needed before this value is reached - say 5-10 secs). Approach 2, when the if results.multi_hand_landmarks line is false, achieves 100-110fps. When if results.multi_hand_landmarks is true the frame rate drops to about 60fps (double approach 1).

The frame rate for Approach 1 doesn't vary based on if results.multi_hand_landmarks but is reduced by increasing the value for cv2.waitKey(5). Approach 2 shows different behaviour when increasing the same value - as the value is increased the frame rate will increase to a point where a maximum frame rate is reached, but this drops off with any further increase in the wait time.

I suspect the correct frame rate is 30fps based on the camera's specs (see below), but that doesnt explain the values from Approach 2.

I have worked through both approaches to eliminate other sources that may affect the frame rate but it was soon obvious (to me at least) that it was the method used to calculate the frame rates. So I ask if anyone can shed some light on why the two approaches produce different results. To me the logic of each method seems valid, but I must be missing something.

Windows 10; Python - 3.7.9; OpenCV - 4.5.4; MediaPipe - 0.8.9; CPU driven (no GPU); Webcam - Logitech C920 (30fps @ 720p/1080p)

Approach 1

import cv2
import mediapipe as mp
import numpy as np
import sys
import time

def main():
    cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)
    
    # start processing the video feed
    capture(cap)

def capture(cap):
    mpHands = mp.solutions.hands
    hands = mpHands.Hands(min_detection_confidence=0.91, min_tracking_confidence=0.91)

    # used for displaying hand and other information in separate window
    mpDraw = mp.solutions.drawing_utils

    # used for displaying hand and other information in separate window
    # initialize time and frame count variables
    last_time = time.time()
    frames = 0

    while True:
        # blocks until the entire frame is read
        success, img = cap.read()

        # used for displaying hand and other information in separate window
        img = cv2.cvtColor(cv2.flip(img,1), cv2.COLOR_BGR2RGB)
        
        # process the image
        results = hands.process(img)        
        
        img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
        
        # if results and landmarks exist process as needed
        if results.multi_hand_landmarks:
            for handLms in results.multi_hand_landmarks:
                # used for displaying hand and landmarks in separate window
                mpDraw.draw_landmarks(img, handLms, mpHands.HAND_CONNECTIONS)
                # Other code goes here to process landmarks

        # used for displaying hand and other information in separate window
        # compute fps: current_time - last_time
        frames  = 1
        delta_time = time.time() - last_time
        cur_fps = np.around(frames / delta_time, 1)

        # used for displaying hand and other information in separate window
        cv2.putText(img, 'FPS: '   str(cur_fps), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
        cv2.imshow("Image", img)
        
        if cv2.waitKey(5) & 0xFF == 27:
            break

if __name__ == "__main__":
    main()

Approach 2

import cv2
import mediapipe as mp
import numpy as np
import sys
import time

def main():
    cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)
    
    # start processing the video feed
    capture(cap)

def capture(cap):
    mpHands = mp.solutions.hands
    hands = mpHands.Hands(min_detection_confidence=0.91, min_tracking_confidence=0.91)

    # used for displaying hand and other information in separate window
    mpDraw = mp.solutions.drawing_utils
    
    # Initialise list to hold instantaneous frame rates
    fps = [0]

    while True:        
        # start time
        start = time.time()
        
        # blocks until the entire frame is read
        success, image = cap.read()

        # used for displaying hand and other information in separate window
        imageRGB = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        results = hands.process(imageRGB)
        
        # if results and landmarks exist process as needed
        if results.multi_hand_landmarks:
            for handLms in results.multi_hand_landmarks:
                # used for displaying hand and landmarks in separate window
                mpDraw.draw_landmarks(image, handLms, mpHands.HAND_CONNECTIONS)
                # Other code goes here to process landmarks
        
        # Define end time
        end = time.time()
        
        # Elapsed time between frames
        elapsed_time = (end - start)
        
        if elapsed_time != 0:
            # Calculate the current instantaneous frame rate
            cur_fps = 1/elapsed_time
            # Append to end of list
            fps.append(cur_fps)
            # Maintain length of list
            if len(fps) == 10:
                del fps[0]
        
        # Calculate the average frame rate
        ave_fps = np.around(sum(fps)/len(fps))

        # used for displaying hand and other information in separate window
        cv2.putText(image, 'FPS: '   str(ave_fps), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
        cv2.imshow("Image", image)
        
        if cv2.waitKey(35) & 0xFF == 27:
            break
            
    cap.release()
        
if __name__ == "__main__":
    main()

CodePudding user response：

Of course it has different results between #1 and #2

in #1, these lines are also counted to the delta_time

        cur_fps = np.around(frames / delta_time, 1)

        # used for displaying hand and other information in separate window
        cv2.putText(img, 'FPS: '   str(cur_fps), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)
        cv2.imshow("Image", img)
        
        if cv2.waitKey(5) & 0xFF == 27:
            break

which make your FPS is more valid because it covers the whole loop time

in #2, you're not fully counting the FPS because you're not including the above codelines to be counted