I've been following this keras video classification tutorial where in the data preparation section, they load the frames of a video in the load_video
function pretty generically, but what caught my eye was this line:
frame = frame[:, :, [2, 1, 0]]
This is the first time I encounter this, most of the times you will just append the frame "as-is" to your list of frames, but here they change the order of the channels (if I'm not mistaken) from RGB to BGR, I couldn't find anything related to it in the web or their docs, can someone give me some insight for this decision?
CodePudding user response:
From experience, the reason why the order can change is dependent on the framework you are using to load in images. OpenCV in particular orders the channels in BGR format because of some optimizations internal that leverage the formatting this way. Images in the regular RGB format can be seen with scikit-image, matplotlib and Pillow.
In fact, if you look at the load_video
function, it uses OpenCV to open up a video so the frames coming in are BGR format. Therefore, the swapping of channels is mandatory to get it to RGB format:
def load_video(path, max_frames=0):
cap = cv2.VideoCapture(path)
frames = []
try:
while True:
ret, frame = cap.read()
if not ret:
break
frame = crop_center(frame)
frame = frame[:, :, [2, 1, 0]]
frames.append(frame)
if len(frames) == max_frames:
break
finally:
cap.release()
return np.array(frames)
You of course do not need to reverse the channels as a neural network will learn based on the input data that it was provided, but people tend to do this so that it's easy to debug images and not have to worry about continuously reversing the channels for display. Specifically, if a neural network was trained in BGR ordering, if you loaded in images in RGB format then the reversal of the channels needs to be done as that was how the image channels were represented in training. All in all, it depends on the framework but you need to keep this in mind when using a neural network after it has been trained. If the data was trained in BGR format, if your images are read in RGB format you'll need to reverse the channels prior to inference.
In fact, this is a common bug when using networks! Be extremely diligent and understand how the image data was preprocessed for the network before using it.