I'm using the FFMPEG Api in Rust to get RGB images from video files.
While some videos work correct and I get the frames back as expected, some work not. Or at least the result is not the way I expected it to be.
The code I use in Rust:
ffmpeg::init().unwrap();
let in_ctx = input(&Path::new(source)).unwrap();
let input = in_ctx
.streams()
.best(Type::Video)
.ok_or(ffmpeg::Error::StreamNotFound)?;
let decoder = input.codec().decoder().video()?;
let scaler = Context::get(
decoder.format(),
decoder.width(),
decoder.height(),
Pixel::RGB24,
decoder.width(),
decoder.height(),
Flags::FULL_CHR_H_INT | Flags::ACCURATE_RND,
)?; // <--- Is basically sws_getContext
// later to get the actual frame
let mut decoded = Video::empty();
if self.decoder.receive_frame(&mut decoded).is_ok() {
let mut rgb_frame = Video::empty();
self.scaler.run(&decoded, &mut rgb_frame)?; // <--- Does sws_scale
println!("Converted Pixel Format: {}", rgb_frame.format() as i32);
Ok(Some(rgb_frame))
}
Which should roughly translate to C like so:
// Get the context and video stream
SwsContext * ctx = sws_getContext(imgWidth, imgHeight,
imgFormat, imgWidth, imgHeight,
AV_PIX_FMT_RGB24, 0, 0, 0, 0);
sws_scale(ctx, decoded.data, decoded.linesize, 0, decoded.height, rgb_frame.data, rbg_frame.linesize);
And like I said earlier, sometimes it works fine and I get the expected frame back. But sometimes I get something like this: Weird result image
I saved the images as .ppm files for quick visual comparison. I used this method, which basically writes the bytes to a file with a simple .ppm header:
fn save_file(frame: &Video, index: usize) -> std::result::Result<(), std::io::Error>
{
let mut file = File::create(format!("frame{}.ppm", index))?;
file.write_all(format!("P6\n{} {}\n255\n", frame.width(), frame.height()).as_bytes())?;
file.write_all(frame.data(0))?;
Ok(())
}
Here you can see that on the left side there is a good image result vs. on the right side there is a bad image result. Comparison of the .ppm files
To come to the question now:
Why is this happening. I tested everything on my side and the only thing left is ffmpeg conversion. FFMPEG seems to convert these two test files differently even though it reports YUV420P as format for both. I cannot figure out what the difference may be...
Here the info for the two video files i used:
Good video file:
General
Complete name : /mnt/smb/Snapchat-174933781.mp4
Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (isom/mp42)
File size : 1.90 MiB
Duration : 9 s 612 ms
Overall bit rate : 1 661 kb/s
Encoded date : UTC 2021-07-28 22:09:36
Tagged date : UTC 2021-07-28 22:09:36
eng : -180.00
Video
ID : 512
Format : AVC
Format/Info : Advanced Video Codec
Format profile : [email protected]
Format settings : CABAC / 1 Ref Frames
Format settings, CABAC : Yes
Format settings, Reference frames : 1 frame
Format settings, GOP : M=1, N=30
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 9 s 598 ms
Bit rate : 1 597 kb/s
Width : 480 pixels
Height : 944 pixels
Display aspect ratio : 0.508
Frame rate mode : Variable
Frame rate : 29.797 FPS
Minimum frame rate : 15.000 FPS
Maximum frame rate : 30.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.118
Stream size : 1.83 MiB (96%)
Title : Snap Video
Language : English
Encoded date : UTC 2021-07-28 22:09:36
Tagged date : UTC 2021-07-28 22:09:36
Color range : Full
colour_range_Original : Limited
Color primaries : BT.709
Transfer characteristics : BT.601
transfer_characteristics_Original : BT.709
Matrix coefficients : BT.709
Codec configuration box : avcC
Audio
ID : 256
Format : AAC LC
Format/Info : Advanced Audio Codec Low Complexity
Codec ID : mp4a-40-2
Duration : 9 s 612 ms
Bit rate mode : Constant
Bit rate : 62.0 kb/s
Channel(s) : 1 channel
Channel layout : C
Sampling rate : 44.1 kHz
Frame rate : 43.066 FPS (1024 SPF)
Compression mode : Lossy
Stream size : 73.3 KiB (4%)
Title : Snap Audio
Language : English
Encoded date : UTC 2021-07-28 22:09:36
Tagged date : UTC 2021-07-28 22:09:36
Bad video file:
General
Complete name : /mnt/smb/Snapchat-1989594918.mp4
Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (isom/mp42)
File size : 2.97 MiB
Duration : 6 s 313 ms
Overall bit rate : 3 948 kb/s
Encoded date : UTC 2019-07-11 06:43:04
Tagged date : UTC 2019-07-11 06:43:04
com.android.version : 9
Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : [email protected]
Format settings : 1 Ref Frames
Format settings, CABAC : No
Format settings, Reference frames : 1 frame
Format settings, GOP : M=1, N=30
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 6 s 313 ms
Bit rate : 3 945 kb/s
Width : 496 pixels
Height : 960 pixels
Display aspect ratio : 0.517
Frame rate mode : Variable
Frame rate : 29.306 FPS
Minimum frame rate : 19.767 FPS
Maximum frame rate : 39.508 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.283
Stream size : 2.97 MiB (100%)
Title : VideoHandle
Language : English
Encoded date : UTC 2019-07-11 06:43:04
Tagged date : UTC 2019-07-11 06:43:04
Color range : Limited
Color primaries : BT.709
Transfer characteristics : BT.709
Matrix coefficients : BT.709
Codec configuration box : avcC
Or as a diff image: image diff
The problem is that I am not that familiar with ffmpeg yet I don't know all the quirks it has.
I hope someone can point me in the right direction.
CodePudding user response:
Thanks to the suggestions of @SuRGeoNix and @Jmb I played around with the linesize and width of the input.
After a bit I learned that ffmpeg requires 32bit aligned data to perform optimally. So I adjusted the scaler to scale to a 32bit aligned width and the output is fine now.
ffmpeg::init().unwrap();
let in_ctx = input(&Path::new(source)).unwrap();
let input = in_ctx
.streams()
.best(Type::Video)
.ok_or(ffmpeg::Error::StreamNotFound)?;
let decoder = input.codec().decoder().video()?;
// Round to the next 32bit divisible width
let width = if decoder.width() % 32 != 0 {
decoder.width() 32 - (decoder.width() % 32)
} else {
decoder.width()
};
let scaler = Context::get(
decoder.format(),
decoder.width(),
decoder.height(),
Pixel::RGB24,
width, // Use the calculated width here
decoder.height(),
Flags::FULL_CHR_H_INT | Flags::ACCURATE_RND,
)?;