I have a data frame that looks like this
df <- data.frame(col1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"C", "C", "C", "C", "C"),
col2 = c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E",
"A", "B", "C", "D", "E"))
what I want is to have like this
df <- data.frame(col1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"C", "C", "C", "C", "C"),
col2 = c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E",
"A", "B", "C", "D", "E"),
col3 = c("1","0","0","0","0","1","1","0","0","0","1","1","1","0","0"))
In col3, it counts the duplicated characters as 1 and unique as 0. row 6 is considered a duplicate because the swap characters ("B", "A") were counted already in row2 as unique ("A", "B"). I can easily do this in excel using the if and countif function. Thanks in advance!
CodePudding user response:
Does this work:
df %>% mutate(col4 = str_c(col1, col2)) %>%
mutate(col5 = lapply(col4, function(x) paste(sort(unlist(strsplit(x, ''))), collapse = ''))) %>%
mutate(col3 = (duplicated(col5) | (col1 == col2))) %>%
select(col1, col2, col3)
col1 col2 col3
1 A A 1
2 A B 0
3 A C 0
4 A D 0
5 A E 0
6 B A 1
7 B B 1
8 B C 0
9 B D 0
10 B E 0
11 C A 1
12 C B 1
13 C C 1
14 C D 0
15 C E 0
CodePudding user response:
We can use pmin
and pmax
to sort the values from left to right by rows and apply duplicated
to check the duplicates
transform(
df,
col3 = (duplicated(paste(pmin(col1, col2), pmax(col1, col2))) | col1 == col2)
)
which gives
col1 col2 col3
1 A A 1
2 A B 0
3 A C 0
4 A D 0
5 A E 0
6 B A 1
7 B B 1
8 B C 0
9 B D 0
10 B E 0
11 C A 1
12 C B 1
13 C C 1
14 C D 0
15 C E 0
CodePudding user response:
Here is one option where we look for any duplicates or where col1
and col2
are the same. The
returns a binary for the logical.
df$col3 <- (duplicated(t(apply(df, 1, sort))) | df$col1 == df$col2)
Output
col1 col2 col3
1 A A 1
2 A B 0
3 A C 0
4 A D 0
5 A E 0
6 B A 1
7 B B 1
8 B C 0
9 B D 0
10 B E 0
11 C A 1
12 C B 1
13 C C 1
14 C D 0
15 C E 0
CodePudding user response:
try this
column <- grepl("^[.0-9] $", dat[,1])
column
dat2 <- data.frame(Sex = dat[cbind(seq_len(nrow(dat)),1 column)], Length =
dat[cbind(seq_len(nrow(dat)),2-column)])
dat2$Length <- as.numeric(dat2$Length)
dat2