Home > Software engineering >  Is there a way in R to create a new column that references the values in two columns based of an ID
Is there a way in R to create a new column that references the values in two columns based of an ID

Time:06-14

For example let's take the 6 x 2 data set below

ID Val 

1  yes

2 yes

3  yes

2  no

3  yes

1  no

People with the same ID are in the same "group" and I'd want to make a column that signifies whether or not they had the same answer so like Val[person 1] == Val[person 2]

CodePudding user response:

in tidyverse:

df %>%
  group_by(ID) %>%
  mutate(same = n_distinct(Val) == 1)

# A tibble: 6 x 3
# Groups:   ID [3]
     ID Val   same 
  <int> <chr> <lgl>
1     1 yes   FALSE
2     2 yes   FALSE
3     3 yes   TRUE 
4     2 no    FALSE
5     3 yes   TRUE 
6     1 no    FALSE

in base R:

transform(df, same = ave(Val, ID, FUN = \(x)length(unique(x)))==1)
  ID Val  same
1  1 yes FALSE
2  2 yes FALSE
3  3 yes  TRUE
4  2  no FALSE
5  3 yes  TRUE
6  1  no FALSE

CodePudding user response:

We could use lag and fill

library(dplyr)
library(tidyr)

df %>% 
  group_by(ID) %>% 
  mutate(signify = ifelse(Val == lag(Val), TRUE, FALSE)) %>% 
  fill(signify, .direction = "up")
     ID Val   signify
  <int> <chr> <lgl>  
1     1 yes   FALSE  
2     2 yes   FALSE  
3     3 yes   TRUE   
4     2 no    FALSE  
5     3 yes   TRUE   
6     1 no    FALSE 

CodePudding user response:

Using data.table

library(data.table)
setDT(df1)[, same := uniqueN(Val) == 1, ID]

-output

> df1
      ID    Val   same
   <int> <char> <lgcl>
1:     1    yes  FALSE
2:     2    yes  FALSE
3:     3    yes   TRUE
4:     2     no  FALSE
5:     3    yes   TRUE
6:     1     no  FALSE

Or with fndistinct from collapse

library(collapse)
df1$same <- (fndistinct(df1$Val, g = df1$ID)==1)[as.character(df1$ID)]

data

df1 <- structure(list(ID = c(1L, 2L, 3L, 2L, 3L, 1L), Val = c("yes", 
"yes", "yes", "no", "yes", "no")), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

Try

df %>% group_by(ID) %>% summarise(A = Reduce(`==` ,Val )) -> df1

merge(df , df1)
  •  Tags:  
  • r
  • Related