I have a large dataset data
with many non-numeric columns x1, x2, ... x30
, and a numeric column y
.
I would like to compute a mean absolute deviation (MAD) for y
per different x1
and x2
combinations.
Say, for x1 == 'A'
and x2 == 'B'
, I want to compute MAD for y
. I did:
data %>%
group_by(x1, x2) %>%
filter(x1 == "A", x2 == "B") %>%
summarise(mad = mad(y, center = mean(y)))
However, when I compute it manually, it returns a different value:
data %>%
group_by(x1, x2) %>%
filter(x1 == "A", x2 == "B") %>%
summarise(manual_mad = sum(abs(y - mean(y)))/n())
Which one is a correct computation, and how should I tweak one or another to have the same value?
CodePudding user response:
From the documentation of ?mad
:
The actual value calculated is constant * cMedian(abs(x - center)).
Indeed, with 1.4826 being the default constant
value, we get the same result manually:
y = 1:10
mad(y, center = mean(y))
#[1] 3.7065
1.4826 * median(abs(y - mean(y)))
#[1] 3.7065
CodePudding user response:
Apparently you are looking for the mean absolute deviation, which is defined1 as:
MAD = Σ(|x_i - μ|)/n
mnad <- \(x, ...) mean(abs(x - mean(x, ...)), ...)
mnad(1:9)
# [1] 2.222222
The mad()
function calculates the median absolute deviation.