Home > OS >  how to find duplicated columns in row in R?
how to find duplicated columns in row in R?

Time:01-03

I have such a data frame below and I want to find duplicated columns in each row of this data frame. Please see the input and output example below. 0 is repeated 2 times in the first row, that is why column rep should be 0 (data_input[1,"rep"]=0); 2 is repeated 2 times in the second row, that is why column rep should be 0; there are no replicated values in the 3rd row that is why column rep can be 4 (or you can add any value instead of 0,1,2) and 1 is repeated 3 times in the 4th row, that is why column rep should be 1.

 data_input=data.frame(X1=c(0,1,2,1), X2=c(0,2,1,1), 
  X3=c(1,2,0,1)) 

 data_output=data.frame(X1=c(0,1,2,1), 
  X2=c(0,2,1,1), X3=c(1,2,0,1), rep=c(0,2,4,1)) 

CodePudding user response:

Here is an option with rowwise - create the rowwise attribute, then find the duplicated element from the row, if there are none, replace the NA with 4

library(dplyr)
library(tidyr)
data_input %>% 
  rowwise %>% 
  mutate(rep = {tmp <- c_across(everything())
          replace_na(tmp[duplicated(tmp)][1], 4)
   }) %>%
  ungroup

-output

# A tibble: 4 × 4
     X1    X2    X3   rep
  <dbl> <dbl> <dbl> <dbl>
1     0     0     1     0
2     1     2     2     2
3     2     1     0     4
4     1     1     1     1

Above solution didn't consider the case where there are multiple duplicates. If there are, then either consider to create a list column or paste the unique elements together to a single string

data_input %>% 
  rowwise %>% 
  mutate(rep = {tmp <- c_across(everything())
                tmp <- toString(sort(unique(tmp[duplicated(tmp)])))
                replace(tmp, tmp == "", "4")
   }) %>%
  ungroup

-output

# A tibble: 4 × 4
     X1    X2    X3 rep  
  <dbl> <dbl> <dbl> <chr>
1     0     0     1 0    
2     1     2     2 2    
3     2     1     0 4    
4     1     1     1 1    

Or using base R

data_input$rep <- apply(data_input, 1, FUN = \(x) x[anyDuplicated(x)][1])
data_input$rep[is.na(data_input$rep)] <- 4

CodePudding user response:

Another solution, based on base R:

nCols <- ncol(data_input)

data_output <- cbind(
 data_input, rep = apply(data_input, 1,
  function(x) if (length(table(x)) != nCols) x[which.max(table(x))] else nCols 1))

data_output

#>   X1 X2 X3 rep
#> 1  0  0  1   0
#> 2  1  2  2   2
#> 3  2  1  0   4
#> 4  1  1  1   1
  •  Tags:  
  • r
  • Related