I am trying to create my own mean function. I have the following line of code already. However, I want to add trim to the argument such that lower and upper bounds outliers are excluded. Please how do I do this?
Below is the mean function I currently have:
mymeanfunction <- function(x) {
xbar <- sum(x)/length(x)
xbar
}
CodePudding user response:
Something like this? The function below accepts extra arguments na.rm
and the dots argument so that the call can ask for other whiskers' lengths via boxplot.stats
argument coef
.
mymean <- function(x, trim, na.rm = FALSE, ...) {
out <- boxplot.stats(x, ...)$out
y <- x[!x %in% out]
mean(y, na.rm = na.rm)
}
CodePudding user response:
The trim=
argument of mean
considers the percentage of quantile
s to remove from head and tail of a vector before computing the mean. So you can write:
mymeanfunction <- function(x, trim=0) {
if (trim > 0) {
q <- quantile(x, c(0 trim, 1 - trim))
x <- x[x > q[1] & x < q[2]]
}
xbar <- sum(x)/length(x)
xbar
}
To implement NA
handling you could enhance the function like this:
mymeanfunction <- function(x, trim=0, na.rm=FALSE) {
if (na.rm) {
x <- x[!is.na(x)]
}
if (anyNA(x)) {
xbar <- NA_real_
} else {
if (trim > 0) {
q <- quantile(x, c(0 trim, 1 - trim))
x <- x[x > q[1] & x < q[2]]
}
xbar <- sum(x)/length(x)
}
xbar
}
mymeanfunction(x, na.rm=TRUE)
# [1] 0.6362622
mymeanfunction(x, trim=.1, na.rm=TRUE)
# [1] 0.66136
## compare
mean(x, na.rm=TRUE)
# [1] 0.6362622
mean(x, trim=.1, na.rm=TRUE)
# [1] 0.66136
If there's no NA
in the data, we don't need to specify na.rm=TRUE
.
Note, that mean
will be much faster, since computation is implemented in C
language. But for educational purposes you see what's going on now.
Data:
set.seed(42)
x <- c(runif(10), NA_real_)