Optimal frequency of modulo operation in finite field arithmetic implementation-CodePudding

I'm trying to implement finite field arithmetic to use it in Elliptic Curve calculations. Since all that's ever used are arithmetic operations that commute with the modulo operator, I don't see a reason not to delaying that operation till the very end. One thing that may happen is that the numbers involved might become (way) too big and impractical/inefficient to work with, but I was wondering if there was a way to determine the optimal conditions/frequency which should trigger a modulo operation in the calculations.

I'm coding in C.

CodePudding user response：

To avoid the complexity of elliptic curve crypto (as I'm unfamiliar with its algorithm); let assume you're doing temp = (a * b) % M; result = (temp * c) % M, and you're thinking about just doing result = (a * b * c) % M instead.

Let's also assume that you're doing this a lot with the same modulo M; so you've precomputed "multiples of M" lookup tables, so that your modulo code can use the table to find the highest multiple of "M shifted left by N" that is not greater than the dividend and subtract it from dividend, and repeat that with decreasing values of N until you're left with the quotient.

If your lookup table has 256 entries, the dividend is 4096 bits and the divisor is 2048 bits; then you'd reduce the size of the dividend by 8 bits per iteration, so dividend would become smaller than the divisor (and you'd find the quotient) after no more than 256 "search and subtract" operations.

For multiplication; it's almost purely "multiply and add digits" for each pair of digits. E.g. using uint64_t as a digit, multiplying 2048 bit numbers is multiplying 32 digit numbers and involves 32 * 32 = 1024 of those "multiply and add digits" operations.

Now we can make comparisons. Specifically, assuming a, b, c, M are 2048-bit numbers:

a) the original temp = (a * b) % M; result = (temp * c) % M would be 1024 "multiply and add", then 256 "search and subtract", then 1024 "multiply and add", then 256 "search and subtract". For totals it'd be 2048 "multiply and add" and 512 "search and subtract".

b) the proposed result = (a * b * c) % M would be 1024 "multiply and add", then would be 2048 "multiply and add" (as the result of a*b will be a "twice as big" 4096-bit number), then 512 "search and subtract" (as a*b*c will be twice as big as a*b). For totals it'd be 3072 "multiply and add" and 512 "search and subtract".

In other words; (assuming lots of assumptions) the proposed result = (a * b * c) % M would be worse, with 50% more "multiply and add" and the exact same "search and subtract".

Of course none of this (the operations you need for elliptic curve crypto, the sizes of your variables, etc) can be assumed to apply for your specific case.

I was wondering if there was a way to determine the optimal conditions/frequency which should trigger a modulo operation in the calculations.

Yes; the way to determine the optimal conditions/frequency is to do similar to what I did above - determine the true costs (in terms of lower level operations, like my "search and subtract" and "multiply and add") and compare them.

In general (regardless of how modulo is implemented, etc) I'd expect you'll find that doing modulo as often as possible is the fastest option (as it reduces the cost of multiplications and also reduces the cost of later/final modulo) for all cases don't involve addition or subtraction, and that don't fit in simple integers.