I have such a data frame below and I want to find duplicated columns in each row of this data frame. Please see the input and output example below. 0 is repeated 2 times in the first row, that is why column rep should be 0 (data_input[1,"rep"]=0
); 2 is repeated 2 times in the second row, that is why column rep should be 0; there are no replicated values in the 3rd row that is why column rep can be 4 (or you can add any value instead of 0,1,2) and 1 is repeated 3 times in the 4th row, that is why column rep should be 1.
data_input=data.frame(X1=c(0,1,2,1), X2=c(0,2,1,1),
X3=c(1,2,0,1))
data_output=data.frame(X1=c(0,1,2,1),
X2=c(0,2,1,1), X3=c(1,2,0,1), rep=c(0,2,4,1))
CodePudding user response:
Here is an option with rowwise
- create the rowwise attribute, then find the duplicated
element from the row, if there are none, replace the NA
with 4
library(dplyr)
library(tidyr)
data_input %>%
rowwise %>%
mutate(rep = {tmp <- c_across(everything())
replace_na(tmp[duplicated(tmp)][1], 4)
}) %>%
ungroup
-output
# A tibble: 4 × 4
X1 X2 X3 rep
<dbl> <dbl> <dbl> <dbl>
1 0 0 1 0
2 1 2 2 2
3 2 1 0 4
4 1 1 1 1
Above solution didn't consider the case where there are multiple duplicates. If there are, then either consider to create a list
column or paste
the unique
elements together to a single string
data_input %>%
rowwise %>%
mutate(rep = {tmp <- c_across(everything())
tmp <- toString(sort(unique(tmp[duplicated(tmp)])))
replace(tmp, tmp == "", "4")
}) %>%
ungroup
-output
# A tibble: 4 × 4
X1 X2 X3 rep
<dbl> <dbl> <dbl> <chr>
1 0 0 1 0
2 1 2 2 2
3 2 1 0 4
4 1 1 1 1
Or using base R
data_input$rep <- apply(data_input, 1, FUN = \(x) x[anyDuplicated(x)][1])
data_input$rep[is.na(data_input$rep)] <- 4
CodePudding user response:
Another solution, based on base R:
nCols <- ncol(data_input)
data_output <- cbind(
data_input, rep = apply(data_input, 1,
function(x) if (length(table(x)) != nCols) x[which.max(table(x))] else nCols 1))
data_output
#> X1 X2 X3 rep
#> 1 0 0 1 0
#> 2 1 2 2 2
#> 3 2 1 0 4
#> 4 1 1 1 1