cv2.VideoCapture inconsistent behavior between cap.set and loop read-CodePudding

I was trying to read the 600th frame of a video using cv2.VideoCapture. However, I found that the following two methods both successfully read an image but the images are different. I was wondering which is the correct way to read the 600th frame, and why the resultant images are different? Is it related to mp4 encoding? Thanks!

Method #1

cap = cv2.VideoCapture("test.mp4")
print(cap.get(cv2.CAP_PROP_FRAME_COUNT)) # 1187
cap.set(1, 600)
ret, frame1 = cap.read()  # Read the frame

Method #2

cap = cv2.VideoCapture("test.mp4")
print(cap.get(cv2.CAP_PROP_FRAME_COUNT)) # 1187
for i in range(601):
    ret, frame2 = cap.read()  # Read the frame

CodePudding user response：

To read/obtain the Xth frame of a video or similarly determine the number of frames in a video file, there are two methods:

Method #1: Utilize built-in OpenCV properties to access video file meta information which is fast and efficient but inaccurate
Method #2: Manually loop over each frame in the video file with a counter which is slow and inefficient but accurate

Method #1 is fast and relies on OpenCV's video property functionality which almost instantaneously determines the frame information in a video file. However, there is an accuracy trade-off since it is dependent on your OpenCV and video codec versions. From the documentation:

Reading / writing properties involves many layers. Some unexpected result might happen along this chain. Effective behavior depends from device hardware, driver and API Backend.

On the otherhand, manually counting each frame until we reach the desired frame number will be 100% accurate although it will be significantly slower. Here's a example to demonstrate the inconsistent behavior between the two methods. It attempts to perform Method #1 by default, if it fails, it will automatically utilize method #2

def frame_count(video_path, manual=False):
    def manual_count(handler):
        frames = 0
        while True:
            status, frame = handler.read()
            if not status:
                break
            frames  = 1
        return frames 

    cap = cv2.VideoCapture(video_path)
    # Slow, inefficient but 100% accurate method 
    if manual:
        frames = manual_count(cap)
    # Fast, efficient but inaccurate method
    else:
        try:
            frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        except:
            frames = manual_count(cap)
    cap.release()
    return frames

Benchmarks

if __name__ == '__main__':
    import timeit
    import cv2

    start = timeit.default_timer()
    print('frames:', frame_count('testtest.mp4', manual=False))
    print(timeit.default_timer() - start, '(s)')

    start = timeit.default_timer()
    print('frames:', frame_count('testtest.mp4', manual=True))
    print(timeit.default_timer() - start, '(s)')

Method #1 results

frames: 3671
0.018054921 (s)

Method #2 results

frames: 3521
9.447095287 (s)

Notice how the two methods differ by 150 frames and Method #2 is significantly slower than Method #1. In general, if you need speed but willing to sacrifice accuracy, use Method #1. In situations where you're fine with a delay but need the exact frame, use Method #2.

So the conclusion is: when you're using cap.get or any of the built in VideoCaptureProperties such as cv2.CAP_PROP_FRAME_COUNT, you're essentially using Method #1 which is fast and efficient but inaccurate. In your first example when you're trying to read an exact frame with cap.set, you're actually getting an "estimated" frame close to the desired Xth frame instead of the actual X frame. In contrast, from your second code snippet, you are manually going through each frame one by one so when it lands on the Xth frame, that is guaranteed to be exact. That's why when you try to read the same frame number using each of the methods, you may get different images.