I have a data set (n=500) in R that looks like this
ID A C S
1 4 4 4
2 3 2 3
3 5 4 2
Id like to create a new variable(I am calling this variable "same") that tells me whether any of my columns have the same value (excluding my ID column). So,
ID A C S Same
1 4 4 4 all
2 3 2 3 as
3 5 4 2 none
4 7 7 2 ac
Any help would be much appreciated! I am pretty lost! Thank you!
CodePudding user response:
We may loop over the rows with apply
(MARGIN = 1
) with selected columns ([-1]
without the 'ID' column), then check the length
of unique
elements, if
it is 1, return 'all' or else
paste
the names
of the duplicated
elements. If there are no duplicates, then it returns blank ""
, change the blank to 'none'
df1$Same <- apply(df1[-1], 1, \(x) {
x1 <- if(length(unique(x)) == 1) 'all' else
paste(tolower(names(x))[duplicated(x)|duplicated(x,
fromLast = TRUE)], collapse = "")
x1[x1 == ""] <- "none"
x1})
-output
> df1
ID A C S Same
1 1 4 4 4 all
2 2 3 2 3 as
3 3 5 4 2 none
4 4 7 7 2 ac
data
df1 <- structure(list(ID = 1:4, A = c(4L, 3L, 5L, 7L), C = c(4L, 2L,
4L, 7L), S = c(4L, 3L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-4L))