Recently, I'm learning the CRC32 algorithm. There are various algorithms and what I'm most interested is the paper published by intel in 2009: Fast CRC Computation Using PCLMULQDQ Instruction. And I have checked the implementation in the kernel.
I have done some mathematics and I can fully understand how the constant is calculated without the bit-reflected. But I still get confused in:
How the bit-reflect constant is calculated in the algorithm? (Section Bit-Reflection in this paper).
In the kernel's implementation, I think we should use k5'(0x163cd6124) to compute the result when we do the folding from 128 to 96 bits (line 207), but it seems that it uses the k4'(0x0ccaa009e)
I have been confused for a long time and I can't find where I make the mistake. :(
CodePudding user response:
- how bit-reflected constants are generated?
I created programs to generate the constants. Other examples of these may exist, but I couldn't find any at the time I wrote my own programs to generate the constants. I generate the constants as if not reflected, then bit reverse them. My program to generate constants for 32 bit-reflected CRC to be used with Visual Studio | MASM (ML64) is here (note the code generates more constants than the example you linked to):
https://github.com/jeffareid/crc/blob/master/crc32r/crc32rg.cpp
- uses 0x0ccaa009e
The constants are right justified, effectively multiplying them by 2^32, so 32 is subtracted from the exponents. Since PCLMULQDQ multiplies reflected values by 2, the constants are shifted left 1 bit, equivalent to dividing by 2. (For 64 bit reflected CRC, the exponent is decremented by 1 since the 64 bit constants can't be shifted). The 128 bit values have to be split up into upper and lower 64 bit values. To fold these 64 bit values forwards T 64 and T 0 bits, and using ()' to represent reflected value, the constants are (x^(T 32) mod P(x))' << 1 and (x^(T-32) mod P(x))' << 1 .
At label less_64:
, the 512 bits in 4 xmm registers are folded into a single 128 bit value, folding 128 bits at a time, using R3 = x^(128 32) mod P(x) and R4 = x^(128-32) mod P(x).
At label fold_64:
the first reduction step folds 64 bits forwards 96 bits and "also adds 32 zeroes", so 32 is not subtracted from the exponent, and it uses R4 = x^(96) mod P(x). The second reduction step folds 32 bits forwards 64 bits and adds another 32 zeroes so again, 32 is not subtracted from the exponent and it uses R5 = x^(64) mod P(x). The resulting 64 bit value will be in the rightmost 64 bits of an xmm register.
Once the folding reduces to a 64 bit value, the 32 bit CRC is calculated as follows:
quotient = 64 bit value / P(x)
product = quotient * P(x)
CRC = lower 32 bits of (64 bit value XOR product)
Instead of using a borrowless divide, the quotient is generated using a carryless multiply:
quotient = (upper 32 bits of 64 bit value) * (x^64 / P(x))
(x^64 / P(x)) is the "inverse" of P(x)
I got a file not found error using your link. I think I found another copy here. Should be the same code.
https://github.com/torvalds/linux/blob/master/arch/x86/crypto/crc32-pclmul_asm.S
Intel has a set of CRC examples at their github repository:
https://github.com/intel/isa-l/tree/master/crc
I converted 6 of these to work with Visual Studio | MASM (ML64), added some missing comments to some of the constants, and also added programs used to generate the constants:
https://github.com/jeffareid/crc
The code is similar, except that it uses 8 xmm registers to fold 1024 bits at a time (instead of 512), and the resulting 64 bit value will end up in the middle 64 bits instead of the rightmost 64 bits of an xmm register.