I am trying to parse an SPS inside an avcC box in a MP4 file. For some reason, I don't get the expected timing values while everything else is fine. Using a hex editor, I extracted these bytes to works with.
byte[] spsSmall =
{
0x67, 0x42, 0xC0, 0x1E, 0x9E, 0x21, 0x81, 0x18, 0x53, 0x4D, 0x40, 0x40,
0x40, 0x50, 0x00, 0x00, 0x03, 0x00, 0x10, 0x00, 0x00, 0x03, 0x03, 0xC8,
0xF1, 0x62, 0xEE
};
And this is H264 Analyzer report after converting my clip .mp4 to .h264
Nal length 29 start code 4 bytes
ref 3 type 7 Sequence parameter set
profile: 66
constaint_set0_flag: 1
constaint_set1_flag: 1
constaint_set2_flag: 0
constaint_set3_flag: 0
level_idc: 30
seq parameter set id: 0
log2_max_frame_num_minus4: 6
pic_order_cnt_type: 0
log2_max_pic_order_cnt_lsb_minus4: 7
num_ref_frames: 2
gaps_in_frame_num_value_allowed_flag: 0
pic_width_in_mbs_minus1: 34 (560)
pic_height_in_map_minus1: 19
frame_mbs_only_flag: 1
derived height: 320
direct_8x8_inference_flag: 1
frame_cropping_flag: 0
vui_parameters_present_flag: 1
aspect_ratio_info_present_flag: 0
overscan_info_present_flag: 0
video_signal_info_present_flag: 1
video_format: 5
video_full_range_flag: 0
colour_description_present_flag: 1
colour_primaries: 1
transfer_characteristics: 1
matrix_coefficients: 1
chroma_loc_info_present_flag: 0
timing_info_present_flag: 1
num_units_in_tick: 1
time_scale: 60
fixed_frame_scale: 1
nal_hrd_parameters_present_flag: 0
vcl_hrd_parameters_present_flag: 0
pic_struct_present_flag: 0
motion_vectors_over_pic_boundaries_flag: 1
max_bytes_per_pic_denom: 0
max_bits_per_mb_denom: 0
log2_max_mv_length_horizontal: 10
log2_max_mv_length_vertical: 10
num_reorder_frames: 0
max_dec_frame_buffering: 2
So I should expect num_units_in_tick to be 1 and time_scale to be 60 but I get for some reason a num_units_in_tick of 48 and a time_scale of 16777216.
You can find my implementation here
I checked FFmpeg and others implementations to see if I was missing something, but they seem to do the same things as me. I tried other clips, but I still have everything right other than the timing info. The doc don't seem to provide more than what I already know. Not only that, I have the colour_primaries, transfer_characteristics, matrix_coefficients all equals to 1 right before the timing info. If I was too far or too early, I would get the value wrong. The chance I get 24 bits with this exact sequence are really low. So I am lost to what I should do.
I found this post saying
If you are using field-based video then this will be a field rate, so you'll have to halve it to get a frame rate.
Not sure what it meant. Even if I halve the number of bits (32 ⇾ 16) or divide by 2, I don't get something close to this.
CodePudding user response:
You should remove emulation_prevention_three_byte from the NAL i.e. you should search for 0x00, 0x00, 0x03 byte aligned sequences and remove 0x03 from there. So that resulting unescaped spsSmall would be:
byte[] spsSmall =
{
0x67, 0x42, 0xC0, 0x1E, 0x9E, 0x21, 0x81, 0x18, 0x53, 0x4D, 0x40, 0x40,
0x40, 0x50, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x03, 0xC8, 0xF1, 0x62,
0xEE
};