Home > Back-end >  Replace value in column with two variables by the most frequent value in matrix in R
Replace value in column with two variables by the most frequent value in matrix in R

Time:11-08

I have following matrix.

r1 <- c("M","A","T","D","T","Y")
r2 <- c("M","A","G","G","D", "J")
r3 <- c("M","B","H","G","T", "Y")
r4 <- c("M","B","G","G","X", "Y")
r5<- c("F","A","H","D","T", "Y")
n.mat <- rbind(r1,r2,r3,r4,r5)
n.mat<-as.data.frame(n.mat)

I would like to replace values in columns with only two values by the most frequent value (for each column). And leave columns with more than two unique values as they are.

Output:

r1 <- c("M","A","T","G","T","Y")
r2 <- c("M","A","G","G","D", "Y")
r3 <- c("M","A","H","G","T", "Y")
r4 <- c("M","A","G","G","X", "Y")
r5<- c("M","A","H","G","T", "Y")
n.mat <- rbind(r1,r2,r3,r4,r5)
n.mat<-as.data.frame(n.mat)

CodePudding user response:

We may use the Mode function with a condition check on the length of unique elements in the column i.e. if the number of unique elements is greater than 2, return the column or else get the Mode

n.mat[] <- lapply(n.mat, function(x) if(length(unique(x)) > 2) x else Mode(x))

where

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

-output

> n.mat
   V1 V2 V3 V4 V5 V6
r1  M  A  T  G  T  Y
r2  M  A  G  G  D  Y
r3  M  A  H  G  T  Y
r4  M  A  G  G  X  Y
r5  M  A  H  G  T  Y

If we need a matrix as output, use as.matrix on the output above

n.mat <- as.matrix(n.mat)

Or use apply instead of lapply

apply(n.mat, 2, FUN = function(x) if(length(unique(x)) > 2) x 
       else rep(Mode(x), length(x)))
  •  Tags:  
  • r
  • Related