Here is a sample dataset:
data <- data.frame(x=c(4,3,4,4,99),
y=c(4,NA,3,2,4),
z = c(88,NA,4,4,5),
w = c(4,5,2,3,4))
I would like to create a new column for means using rowMeans. I would like to keep na.rm=F because if its truly NA I do not want to include that into my means calculation. But if its either 88/99 I would like R to ignore it while calculating the mean and still use the remaining valid values. So far I have the below.
data$mean <- rowMeans(subset(data, select = c(`x`,`y`,`z`,`w`)), na.rm = T)
But I am not sure how to add in a function where it would just ignore the 88 and 99 from calculations.
This is what I am hoping to get
data <- data.frame(x=c(4,3,4,4,99),
y=c(4,NA,3,2,4),
z = c(88,NA,4,4,5),
w = c(4,5,2,3,4),
mean=c(4,NA,3.25,3.25,4.3))
Any help is appreciated - thank you!
CodePudding user response:
Using rowMeans
nevertheless with na.rm=TRUE
, but on a subset and temporally replace
ing 88
and 99
with NA
.
s <- rowSums(is.na(data)) == 0 ## store row subset
v <- c("x", "y", "z", "w") ## col subset to calc. mean
data$mean <- NA ## ini column
m <- as.matrix(data[v]) ## we'll ned a matrix
data$mean[s] <- rowMeans(replace(m[s, v], m[s, v] %in% c(88, 99), NA), na.rm=TRUE)
data
# x y z w mean
# 1 4 4 88 4 4.000000
# 2 3 NA NA 5 NA
# 3 4 3 4 2 3.250000
# 4 4 2 4 3 3.250000
# 5 99 4 5 4 4.333333
Or simply using apply
but is much slower.
f <- \(x) if (any(is.na(x))) NA else mean(x[!x %in% c(88, 99)])
cbind(data, mean=apply(data, 1, f))
# x y z w mean
# 1 4 4 88 4 4.000000
# 2 3 NA NA 5 NA
# 3 4 3 4 2 3.250000
# 4 4 2 4 3 3.250000
# 5 99 4 5 4 4.333333
From microbenchmark
.
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# apply 35.018162 35.941815 38.834333 36.394632 36.960161 212.469412 100 b
# rowMeans 1.097393 1.119396 1.493563 1.193787 1.226691 9.352118 100 a
CodePudding user response:
data <- data.frame(x=c(4,3,4,4,99),
y=c(4,NA,3,2,4),
z = c(88,NA,4,4,5),
w = c(4,5,2,3,4))
df$mean <- apply(data, 1, function(x) {
idx <- which((x %in% c(88, 89)) == FALSE)
mean(x[ idx ], na.rm = TRUE)
})
x y z w mean
1 4 4 88 4 4.00
2 3 NA NA 5 4.00
3 4 3 4 2 3.25
4 4 2 4 3 3.25
5 99 4 5 4 28.00