Home > Back-end >  Creating a new variable based on conditions of 3 other variables in R
Creating a new variable based on conditions of 3 other variables in R

Time:07-15

I have a data set (n=500) in R that looks like this

ID    A      C      S
1     4      4      4 
2     3      2      3
3     5      4      2

Id like to create a new variable(I am calling this variable "same") that tells me whether any of my columns have the same value (excluding my ID column). So,

ID    A      C      S     Same
1     4      4      4     all
2     3      2      3     as
3     5      4      2     none
4     7      7      2     ac

Any help would be much appreciated! I am pretty lost! Thank you!

CodePudding user response:

We may loop over the rows with apply (MARGIN = 1) with selected columns ([-1] without the 'ID' column), then check the length of unique elements, if it is 1, return 'all' or else paste the names of the duplicated elements. If there are no duplicates, then it returns blank "", change the blank to 'none'

df1$Same <- apply(df1[-1], 1, \(x) {
    x1 <- if(length(unique(x)) == 1) 'all' else 
  paste(tolower(names(x))[duplicated(x)|duplicated(x,
    fromLast = TRUE)], collapse = "")
    x1[x1 == ""] <- "none"
  x1})

-output

> df1
  ID A C S Same
1  1 4 4 4  all
2  2 3 2 3   as
3  3 5 4 2 none
4  4 7 7 2   ac

data

df1 <- structure(list(ID = 1:4, A = c(4L, 3L, 5L, 7L), C = c(4L, 2L, 
4L, 7L), S = c(4L, 3L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-4L))
  •  Tags:  
  • r
  • Related