I have a data frame, my_df
:
my_df <- structure(list(C1 = c("A", "X", "X", "A", "A"), F2 = c("A", "A",
"A", "A", "A"), T3 = c("A", "A", "X", "X", "A"), S4 = c("A",
"A", "A", "A", "X"), B5 = c("A", "A", "A", "A", "A")), class = "data.frame", row.names = c("ID1",
"ID2", "ID3", "ID4", "ID5"))
> my_df
C1 F2 T3 S4 B5
ID1 A A A A A
ID2 X A A A A
ID3 X A X A A
ID4 A A X A A
ID5 A A A X A
I want to create a new column, new_col
, that says "same" if all values in all
other columns are identical, otherwise it says "diff". I.e., the resulting data
frame would look like:
> my_df
C1 F2 T3 S4 B5 new_col
ID1 A A A A A same
ID2 X A A A A diff
ID3 X A X A A diff
ID4 A A X A A diff
ID5 A A A X A diff
What is the best way to achieve this using dplyr?
CodePudding user response:
library(tidyverse)
my_df <- structure(list(C1 = c("A", "X", "X", "A", "A"),
F2 = c("A", "A", "A", "A", "A"),
T3 = c("A", "A", "X", "X", "A"),
S4 = c("A", "A", "A", "A", "X"),
B5 = c("A", "A", "A", "A", "A")),
class = "data.frame",
row.names = c("ID1","ID2", "ID3", "ID4", "ID5"))
my_df %>%
rowwise() %>%
mutate(new_col = if_else(
length(unique(c_across())) == 1,
"same",
"diff"
))
#> # A tibble: 5 × 6
#> # Rowwise:
#> C1 F2 T3 S4 B5 new_col
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 A A A A A same
#> 2 X A A A A diff
#> 3 X A X A A diff
#> 4 A A X A A diff
#> 5 A A A X A diff
CodePudding user response:
There are several ways to do this. One is to check if each value equals the first one:
#base R
my_df$new_col <- ifelse(rowSums(my_df == my_df[, 1]) == ncol(my_df), "same", "diff")
my_df$new_col <- ifelse(sapply(my_df, identical, my_df[, 1]), "same", "diff")
#dplyr
my_df %>%
dplyr::mutate(new_col = ifelse(rowSums(. == .[, 1]) == ncol(.), "same", "diff"))
C1 F2 T3 S4 B5 new_col
ID1 A A A A A same
ID2 X A A A A diff
ID3 X A X A A diff
ID4 A A X A A diff
ID5 A A A X A diff
You can also check if the length of unique values per row is 1:
apply(my_df, 1, function(x) length(unique(x)) == 1)
#apply(my_df, 1, function(x) dplyr::n_distinct(x) == 1)
CodePudding user response:
data.table
option using uniqueN
:
library(data.table)
setDT(my_df)[, new_col := c("diff", "same")[(uniqueN(unlist(.SD)) == 1) 1], 1:nrow(my_df)]
my_df
Output:
C1 F2 T3 S4 B5 new_col
1: A A A A A same
2: X A A A A diff
3: X A X A A diff
4: A A X A A diff
5: A A A X A diff