I was trying to read the 600th frame of a video using cv2.VideoCapture
. However, I found that the following two methods both successfully read an image but the images are different. I was wondering which is the correct way to read the 600th frame, and why the resultant images are different? Is it related to mp4 encoding? Thanks!
Method #1
cap = cv2.VideoCapture("test.mp4")
print(cap.get(cv2.CAP_PROP_FRAME_COUNT)) # 1187
cap.set(1, 600)
ret, frame1 = cap.read() # Read the frame
Method #2
cap = cv2.VideoCapture("test.mp4")
print(cap.get(cv2.CAP_PROP_FRAME_COUNT)) # 1187
for i in range(601):
ret, frame2 = cap.read() # Read the frame
CodePudding user response:
To read/obtain the X
th frame of a video or similarly determine the number of frames in a video file, there are two methods:
- Method #1: Utilize built-in OpenCV properties to access video file meta information which is fast and efficient but inaccurate
- Method #2: Manually loop over each frame in the video file with a counter which is slow and inefficient but accurate
Method #1 is fast and relies on OpenCV's video property functionality which almost instantaneously determines the frame information in a video file. However, there is an accuracy trade-off since it is dependent on your OpenCV and video codec versions. From the documentation:
Reading / writing properties involves many layers. Some unexpected result might happen along this chain. Effective behavior depends from device hardware, driver and API Backend.
On the otherhand, manually counting each frame until we reach the desired frame number will be 100% accurate although it will be significantly slower. Here's a example to demonstrate the inconsistent behavior between the two methods. It attempts to perform Method #1 by default, if it fails, it will automatically utilize method #2
def frame_count(video_path, manual=False):
def manual_count(handler):
frames = 0
while True:
status, frame = handler.read()
if not status:
break
frames = 1
return frames
cap = cv2.VideoCapture(video_path)
# Slow, inefficient but 100% accurate method
if manual:
frames = manual_count(cap)
# Fast, efficient but inaccurate method
else:
try:
frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
except:
frames = manual_count(cap)
cap.release()
return frames
Benchmarks
if __name__ == '__main__':
import timeit
import cv2
start = timeit.default_timer()
print('frames:', frame_count('testtest.mp4', manual=False))
print(timeit.default_timer() - start, '(s)')
start = timeit.default_timer()
print('frames:', frame_count('testtest.mp4', manual=True))
print(timeit.default_timer() - start, '(s)')
Method #1 results
frames: 3671
0.018054921 (s)
Method #2 results
frames: 3521
9.447095287 (s)
Notice how the two methods differ by 150 frames and Method #2 is significantly slower than Method #1. In general, if you need speed but willing to sacrifice accuracy, use Method #1. In situations where you're fine with a delay but need the exact frame, use Method #2.
So the conclusion is: when you're using cap.get
or any of the built in VideoCaptureProperties such as cv2.CAP_PROP_FRAME_COUNT
, you're essentially using Method #1 which is fast and efficient but inaccurate. In your first example when you're trying to read an exact frame with cap.set
, you're actually getting an "estimated" frame close to the desired X
th frame instead of the actual X
frame.
In contrast, from your second code snippet, you are manually going through each frame one by one so when it lands on the X
th frame, that is guaranteed to be exact. That's why when you try to read the same frame number using each of the methods, you may get different images.