Home > OS >  How to count a swap characters between two columns in R
How to count a swap characters between two columns in R

Time:09-07

I have a data frame that looks like this

df <- data.frame(col1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
                      "C", "C", "C", "C", "C"), 
             col2 = c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E", 
                      "A", "B", "C", "D", "E"))

what I want is to have like this

df <- data.frame(col1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
                      "C", "C", "C", "C", "C"), 
             col2 = c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E", 
                      "A", "B", "C", "D", "E"),
             col3 = c("1","0","0","0","0","1","1","0","0","0","1","1","1","0","0"))

            

In col3, it counts the duplicated characters as 1 and unique as 0. row 6 is considered a duplicate because the swap characters ("B", "A") were counted already in row2 as unique ("A", "B"). I can easily do this in excel using the if and countif function. Thanks in advance!

CodePudding user response:

Does this work:

df %>% mutate(col4 = str_c(col1, col2)) %>% 
   mutate(col5 = lapply(col4, function(x) paste(sort(unlist(strsplit(x, ''))), collapse = ''))) %>% 
         mutate(col3 =  (duplicated(col5) | (col1 == col2))) %>% 
           select(col1, col2, col3)
   col1 col2 col3
1     A    A    1
2     A    B    0
3     A    C    0
4     A    D    0
5     A    E    0
6     B    A    1
7     B    B    1
8     B    C    0
9     B    D    0
10    B    E    0
11    C    A    1
12    C    B    1
13    C    C    1
14    C    D    0
15    C    E    0

CodePudding user response:

We can use pmin and pmax to sort the values from left to right by rows and apply duplicated to check the duplicates

transform(
  df,
  col3 =  (duplicated(paste(pmin(col1, col2), pmax(col1, col2))) | col1 == col2)
)

which gives

   col1 col2 col3
1     A    A    1
2     A    B    0
3     A    C    0
4     A    D    0
5     A    E    0
6     B    A    1
7     B    B    1
8     B    C    0
9     B    D    0
10    B    E    0
11    C    A    1
12    C    B    1
13    C    C    1
14    C    D    0
15    C    E    0

CodePudding user response:

Here is one option where we look for any duplicates or where col1 and col2 are the same. The returns a binary for the logical.

df$col3 <-  (duplicated(t(apply(df, 1, sort))) | df$col1 == df$col2)

Output

   col1 col2 col3
1     A    A    1
2     A    B    0
3     A    C    0
4     A    D    0
5     A    E    0
6     B    A    1
7     B    B    1
8     B    C    0
9     B    D    0
10    B    E    0
11    C    A    1
12    C    B    1
13    C    C    1
14    C    D    0
15    C    E    0

CodePudding user response:

try this

column <- grepl("^[.0-9] $", dat[,1])
column

dat2 <- data.frame(Sex = dat[cbind(seq_len(nrow(dat)),1 column)], Length = 
dat[cbind(seq_len(nrow(dat)),2-column)])
dat2$Length <- as.numeric(dat2$Length)
dat2
  •  Tags:  
  • r
  • Related