I was calculating e^x using Taylor Series and noticed that when we calculate it for negative x absolute error is large.Is it because we don't have enough precision to calculate it?
(I know that to prevent it we can use e^(-x)=1/e^x)
#include <stdio.h>
#include <math.h>
double Exp(double x);
int main(void)
{
double x;
printf("x=");
scanf("%le", &x);
printf("%le", Exp(x));
return 0;
}
double Exp(double x)
{
double h, eps = 1.e-16, Sum = 1.0;
int i = 2;
h = x;
do
{
Sum = h;
h *= x / i;
i ;
} while (fabs(h) > eps);
return Sum ;
}
For example: x=-40 the value is 4.24835e-18 but programm gives me 3.116952e-01.The absolute error is ~0.311
x=-50 the value is 1.92875e-22 programm gives me 2.041833e 03.The absolute error is ~2041.833
CodePudding user response:
The problem is caused by rounding errors at the middle phase of the algorithm.
The h
is growing quickly as 40/2 * 40/3 * 40 / 4 * ...
and oscillating in sign. The values for i
, h
and Sum
for x=-40
for consecutive iterations can be found below (some data points omitted for brevity):
x=-40
i=2 h=800 Sum=-39
i=3 h=-10666.7 Sum=761
i=4 h=106667 Sum=-9905.67
i=5 h=-853333 Sum=96761
i=6 h=5.68889e 06 Sum=-756572
...
i=37 h=-1.37241e 16 Sum=6.63949e 15
i=38 h=1.44464e 16 Sum=-7.08457e 15
i=39 h=-1.48168e 16 Sum=7.36181e 15
i=40 h=1.48168e 16 Sum=-7.45499e 15
i=41 h=-1.44554e 16 Sum=7.36181e 15
i=42 h=1.37671e 16 Sum=-7.09361e 15
i=43 h=-1.28066e 16 Sum=6.67346e 15
i=44 h=1.16423e 16 Sum=-6.13311e 15
i=45 h=-1.03487e 16 Sum=5.50923e 15
i=46 h=8.99891e 15 Sum=-4.83952e 15
...
i=97 h=-2610.22 Sum=1852.36
i=98 h=1065.4 Sum=-757.861
i=99 h=-430.463 Sum=307.534
...
i=138 h=1.75514e-16 Sum=0.311695
i=139 h=-5.05076e-17 Sum=0.311695
3.116952e-01
The peak magnitude of sum is 7e15
. This is where the precision is lost. Type double
can be represented with about 1e-16
accuracy. This gives expected absolute error of about 0.1 - 1
.
As the expected sum (value of exp(-40)
is close to zero the final absolute error is close to the maximal absolute error of the partial sums.
For x=-50
the peak value of sum is 1.5e20
what gives the absolute error due to finite representation of double
at about 1e3 - 1e4
what is close to observed one.
Not much can be fixed without significant changes to algorithm to avoid forming those partial sums. Alternatively, compute exp(-x)
as 1/exp(x)
.
CodePudding user response:
When 'x' goes far into the negative, e^x
becomes really small compared to eps=1.e-16
. That means that h
may be less than eps
after the first iteration.
To solve this, you may use a fixed number of iterations, or use an epsilon relative to the sum so far:
do {
....
} while (fabs(h/sum) > eps);