I have following matrix.
r1 <- c("M","A","T","D","T","Y")
r2 <- c("M","A","G","G","D", "J")
r3 <- c("M","B","H","G","T", "Y")
r4 <- c("M","B","G","G","X", "Y")
r5<- c("F","A","H","D","T", "Y")
n.mat <- rbind(r1,r2,r3,r4,r5)
n.mat<-as.data.frame(n.mat)
I would like to replace values in columns with only two values by the most frequent value (for each column). And leave columns with more than two unique values as they are.
Output:
r1 <- c("M","A","T","G","T","Y")
r2 <- c("M","A","G","G","D", "Y")
r3 <- c("M","A","H","G","T", "Y")
r4 <- c("M","A","G","G","X", "Y")
r5<- c("M","A","H","G","T", "Y")
n.mat <- rbind(r1,r2,r3,r4,r5)
n.mat<-as.data.frame(n.mat)
CodePudding user response:
We may use the Mode
function with a condition check on the length
of unique
elements in the column i.e. if
the number of unique
elements is greater than 2, return the column or else get the Mode
n.mat[] <- lapply(n.mat, function(x) if(length(unique(x)) > 2) x else Mode(x))
where
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
-output
> n.mat
V1 V2 V3 V4 V5 V6
r1 M A T G T Y
r2 M A G G D Y
r3 M A H G T Y
r4 M A G G X Y
r5 M A H G T Y
If we need a matrix
as output, use as.matrix
on the output above
n.mat <- as.matrix(n.mat)
Or use apply
instead of lapply
apply(n.mat, 2, FUN = function(x) if(length(unique(x)) > 2) x
else rep(Mode(x), length(x)))