Best non-trigonometric floating point approximation of tanh(x) in 10 instructions or less-CodePudding

Description

I need a reasonably accurate fast hyperbolic tangent for a machine that has no built-in floating point trigonometry, so e.g. the usual tanh(x) = (exp(2x) - 1) / (exp(2x) 1) formula is going to need an approximation of exp(2x).
All other instructions like addition, subtraction, multiplication, division, and even FMA (= MUL ADD in 1 op) are present.

Right now I have several approximations, but none of them are satisfactory in terms of accuracy.

[Update from the comments:]

The instruction for trunc()/floor() is available
There is a way to transparently reinterpret floats as integers and do all kinds of bit ops
There is a family of instructions called SEL.xx (.GT, .LE, etc.) which compare 2 values and choose what to write to the destination
DIVs are twice as slow, so nothing exceptional, DIVs are okay to use

Approach 1

Accuracy: ±1.2% absolute error, see here.

Pseudocode (A = accumulator register, T = temporary register):

[1] FMA T, 36.f / 73.f, A, A   // T := 36/73   X^2
[2] MUL A, A, T                // A := X(36/73   X^2)
[3] ABS T, A                   // T := |X(36/73   X^2)|
[4] ADD T, T, 32.f / 73.f      // T := |X(36/73   X^2)|   32/73
[5] DIV A, A, T                // A := X(36/73   X^2) / (|X(36/73   X^2)|   32/73)

Approach 2

Accuracy: ±0.9% absolute error, see here.

Pseudocode (A = accumulator register, T = temporary register):

[1] FMA T, 3.125f, A, A        // T := 3.125   X^2
[2] DIV T, 25.125f, T          // T := 25.125/(3.125   X^2)
[3] MUL A, A, 0.1073f          // A := 0.1073*X
[4] FMA A, A, A, T             // A := 0.1073*X   0.1073*X*25.125/(3.125   X^2)
[5] MIN A, A, 1.f              // A := min(0.1073*X   0.1073*X*25.125/(3.125   X^2), 1)
[6] MAX A, A, -1.f             // A := max(min(0.1073*X   0.1073*X*25.125/(3.125   X^2), 1), -1)

Approach 3

Accuracy: ±0.13% absolute error, see here.

Pseudocode (A = accumulator register, T = temporary register):

[1] FMA T, 14.f, A, A          // T := 14   X^2
[2] FMA T, -133.f, T, T        // T := (14   X^2)^2 - 133
[3] DIV T, A, T                // T := X/((14   X^2)^2 - 133)
[4] FMA A, 52.5f, A, A         // A := 52.5   X^2
[5] MUL A, A, RSQRT(15.f)      // A := (52.5   X^2)/sqrt(15)
[6] FMA A, -120.75f, A, A      // A := (52.5   X^2)^2/15 - 120.75
[7] MUL A, A, T                // A := ((52.5   X^2)^2/15 - 120.75)*X/((14   X^2)^2 - 133)
[8] MIN A, A, 1.f              // A := min(((52.5   X^2)^2/15 - 120.75)*X/((14   X^2)^2 - 133), 1)
[9] MAX A, A, -1.f             // A := max(min(((52.5   X^2)^2/15 - 120.75)*X/((14   X^2)^2 - 133), 1), -1)

The question

Is there anything better that can possibly fit in 10 non-trigonometric float32 instructions?

CodePudding user response：

Nic Schraudolph, author of the paper describing the exponential approximation that the previous version of this answer uses, suggests the following. It has error 0.5%.

Java implementation (for portable bit munging):

public class Tanh {
  private static final float m = (float)((1 << 23) / Math.log(2));
  private static final int b = Float.floatToRawIntBits(1);

  private static float tanh(float x) {
    int y = (int)(m * x);
    float exp_x = Float.intBitsToFloat(b   y);
    float exp_minus_x = Float.intBitsToFloat(b - y);
    return (exp_x - exp_minus_x) / (exp_x   exp_minus_x);
  }

  public static void main(String[] args) {
    double error = 0;
    int end = Float.floatToRawIntBits(10);
    for (int i = 0; i <= end; i  ) {
      float x = Float.intBitsToFloat(i);
      error = Math.max(error, Math.abs(tanh(x) - Math.tanh(x)));
    }
    System.out.println(error);
  }
}

CodePudding user response：

After doing much exploratory work, I came to the conclusion that approach 2 is the most promising direction. Since division is very fast on the askers platform, rational approximations are attractive. The platform's support for FMA should be exploited aggressively. Below I am showing C code that implements a fast tanhf() in seven operations and achieves maximum absolute error of less than 3.3e-3.

I used the Remez algorithm to compute the coefficients for the rational approximation and used a heuristic search to reduce these coefficients to as few bits as feasible, which may benefit some processor architectures that are able to incorporate floating-point data into an immediate field of commonly used floating-point instructions.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

/* Fast computation of hyperbolic tangent. Rational approximation with clamping.
   Maximum absolute errror = 3.20235857e-3 @  /-3.21770620
*/
float fast_tanhf_rat (float x)
{
    const float n0 = -8.69873047e-1f; // -0x1.bd6000p-1
    const float n1 = -8.78143311e-3f; // -0x1.1fc000p-7
    const float d0 =  2.72656250e 0f; //  0x1.5d0000p 1
    float x2 = x * x;
    float num = fmaf (n0, x2, n1);
    float den = x2   d0;
    float quot = num / den;
    float res = fmaf (quot, x, x);
    res = fminf (fmax (res, -1.0f), 1.0f);
    return res;
}

int main (void)
{
    double ref, err, maxerr = 0;
    float arg, res, maxerrloc = INFINITY;
    maxerr = 0;
    arg = 0.0f;
    while (arg < 0x1.0p64f) {
        res = fast_tanhf_rat (arg);
        ref = tanh ((double)arg);
        err = fabs ((double)res - ref);
        if (err > maxerr) {
            maxerr = err;
            maxerrloc = arg;
        }
        arg = nextafterf (arg, INFINITY);
    }
    arg = -0.0f;
    while (arg > -0x1.0p64f) {
        res = fast_tanhf_rat (arg);
        ref = tanh ((double)arg);
        err = fabs ((double)res - ref);
        if (err > maxerr) {
            maxerr = err;
            maxerrloc = arg;
        }
        arg = nextafterf (arg, -INFINITY);
    }
    printf ("maximum absolute error = .8e @ .8e\n", maxerr, maxerrloc);
    return EXIT_SUCCESS;
}