I decided to experiment with file formats and I'm using python to read said files.
Everything I have extracted from the Ogg header is correct, except the crc check.
The documentation says you must check the entire header and page with the original crc check value set to 0.
I'm wondering what steps I'm missing to get the expected result.
import zlib
import struct
with open("sample3.opus", "rb") as f_:
file_data = f_.read()
cp, ssv, htf, agp, ssn, psn, pc, ps = struct.unpack_from("<4sBBQIIIB", file_data, 0)
offset = struct.calcsize("<4sBBQIIIB")
segments = struct.unpack_from(f"<{ps}B", file_data, offset)
packet_size = 0
for num in segments:
packet_size = num
header_size = offset len(segments) packet_size
# Copying the entire packet then changing the crc to 0.
header_copy = bytearray()
header_copy.extend(file_data[0:header_size])
struct.pack_into("<I", header_copy, struct.calcsize("<4sBBQII"), 0)
print(pc)
print(zlib.crc32(header_copy))
This script results in:
277013243
752049619
The audio file I'm using:
https://filesamples.com/formats/opus
CodePudding user response:
zlib.crc32()
is not the CRC that they specify. They say the initial value and final exclusive-or is zero, whereas for zlib.crc32()
, those values are both 0xffffffff
. They fail to specify whether their CRC is reflected or not, so you'd need to try both to see which it is.
Update:
I checked, and it's a forward CRC. Unfortunately, you can't use zlib.crc32()
to calculate it. You can compute it with this:
def crc32ogg(seq):
crc = 0
for b in seq:
crc ^= b << 24
for _ in range(8):
crc = (crc << 1) ^ 0x104c11db7 if crc & 0x80000000 else crc << 1
return crc