Following are the codes for skewness and kurtosis in MATLAB:
clc; clear all
% Generate "N" data points
N = 1:1:2000;
% Set sampling frequency
Fs = 1000;
% Set time step value
dt = 1/Fs;
% Frequency of the signal
f = 5;
% Generate time array
t = N*dt;
% Generate sine wave
y = 10 5*sin(2*pi*f*t);
% Skewness
y_skew = skewness(y);
% Kurtosis
y_kurt = kurtosis(y);
The answer acquired in MATLAB is:
y_skew = 4.468686410415491e-15
y_kurt = 1.500000000000001 (Value is positive in MATLAB)
Now, below are the codes in Python:
import numpy as np
from scipy.stats import skew
from scipy.stats import kurtosis
# Generate "N" data points
N = np.linspace(1,2000,2000)
# Set sampling frequency
Fs = 1000
# Set time step value
dt = 1/Fs
# Frequency of the signal
f = 5
# Generate time array
t = N*dt
# Generate sine wave
y = 10 5*np.sin(2*np.pi*f*t);
# Skewness
y_skew = skew(y)
# Kurtosis
y_kurt = kurtosis(y)
The answer acquired in Python is:
y_skew = -1.8521564287013977e-16
y_kurt = -1.5 (Value has turned out to be negative in Python)
Can somebody please explain, why do we have different answers for skewness and kurtosis, in MATLAB and Python?
Specifically, in the case of kurtosis, the value has changed from positive to negative. Can somebody please help me out in understanding this.
CodePudding user response:
This is the difference between the Fisher and Pearson measure of kurtosis.
From the MATLAB docs:
Kurtosis is a measure of how outlier-prone a distribution is. The kurtosis of the normal distribution is 3. Distributions that are more outlier-prone than the normal distribution have kurtosis greater than 3; distributions that are less outlier-prone have kurtosis less than 3. Some definitions of kurtosis subtract 3 from the computed value, so that the normal distribution has kurtosis of 0. The
kurtosis
function does not use this convention.
From the scipy docs:
Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution.
Noting that Fisher's definition is used by default in scipy
scipy.stats.kurtosis(a, axis=0, fisher=True, ...)
Your results would be equivalent if you used fisher=False
in Python (or manually add 3) or subtracted 3 from your MATLAB result so that they were both using the same definition.
So it looks like the sign is being flipped, but that's just by chance since 1.5 - 3 = -1.5
.
The difference in skewness appears to be due to numerical precision, since both results are basically 0. Please see Why is 24.0000 not equal to 24.0000 in MATLAB?