Difference between MAD function and manual MAD computation in R-CodePudding

I have a large dataset data with many non-numeric columns x1, x2, ... x30, and a numeric column y.

I would like to compute a mean absolute deviation (MAD) for y per different x1 and x2 combinations.

Say, for x1 == 'A' and x2 == 'B', I want to compute MAD for y. I did:

data %>%
   group_by(x1, x2) %>%
   filter(x1 == "A", x2 == "B") %>%
   summarise(mad = mad(y, center = mean(y)))

However, when I compute it manually, it returns a different value:

data %>%
   group_by(x1, x2) %>%
   filter(x1 == "A", x2 == "B") %>%
   summarise(manual_mad = sum(abs(y - mean(y)))/n())

Which one is a correct computation, and how should I tweak one or another to have the same value?

CodePudding user response：

From the documentation of ?mad:

The actual value calculated is constant * cMedian(abs(x - center)).

Indeed, with 1.4826 being the default constant value, we get the same result manually:

y = 1:10
mad(y, center = mean(y))
#[1] 3.7065

1.4826 * median(abs(y - mean(y)))
#[1] 3.7065

CodePudding user response：

Apparently you are looking for the mean absolute deviation, which is defined¹ as:

MAD = Σ(|x_i - μ|)/n

mnad <- \(x, ...) mean(abs(x - mean(x, ...)), ...)

mnad(1:9)
# [1] 2.222222

The mad() function calculates the median absolute deviation.