Home > Software engineering >  How to swap column values in a data.table using R
How to swap column values in a data.table using R

Time:12-16

I have the toy data given below.

library(data.table)
(tmp <- data.table(R1 = c('D','D','D','T','C'), y = 10:1, R2 = c('D','A','Z','D','D')))

    R1  y R2
 1:  D 10  D
 2:  D  9  A
 3:  D  8  Z
 4:  T  7  D
 5:  C  6  D
 6:  D  5  D
 7:  D  4  A
 8:  D  3  Z
 9:  T  2  D
10:  C  1  D

I want to swap values in column R1 and R2 so that all the A are listed under R1 and the un-common value goes to R2. Can someone show me how it can be done? Here is the desired output.

    R1  y R2
 1:  D 10  D
 2:  D  9  A
 3:  D  8  Z
 4:  D  7  T
 5:  D  6  C
 6:  D  5  D
 7:  D  4  A
 8:  D  3  Z
 9:  D  2  T
10:  D  1  C

Here are the performance results of answers provided below -

Unit: milliseconds
    expr      min       lq     mean   median       uq       max neval cld
   akrun 5.524562 5.587740 7.526681 5.605406 5.938955 14.976740     5   b
 r2evans 1.466862 1.489944 1.509321 1.500263 1.536402  1.553134     5  a 

CodePudding user response:

Based on the update, we may specify a logical expression on i and swap the column values to assign

library(data.table)
val <- "D"
tmp[R2 == val, c("R1", "R2") := .(R2, R1)]

-output

> tmp
    R1  y R2
 1:  D 10  D
 2:  D  9  A
 3:  D  8  Z
 4:  D  7  T
 5:  D  6  C
 6:  D  5  D
 7:  D  4  A
 8:  D  3  Z
 9:  D  2  T
10:  D  1  C

CodePudding user response:

I suspect that the other answer may be the most applicable, but in case your needs are not based on lexicographic sorting (and indeed just on presence in a set of "interesting" values), then

interesting <- c("A")
tmp[, c("R1", "R2") := .(
  fifelse(R2 %in% interesting & !R1 %in% interesting, R2, R1),
  fifelse(R2 %in% interesting & !R1 %in% interesting, R1, R2))]
tmp
#         x     R1     R2
#     <int> <char> <char>
#  1:     1      A      A
#  2:     2      A      F
#  3:     3      A      T
#  4:     4      A      G
#  5:     5      A      I
#  6:     6      A      A
#  7:     7      A      F
#  8:     8      A      T
#  9:     9      A      G
# 10:    10      A      I

I confess that this looks a little clumsy, double-calculating the conditional. That could easily be worked in more efficiently as a temporary variable either inside or outside of the tmp frame, such as:

tmp[, swap := R2 %in% interesting & !R1 %in% interesting
  ][, c("R1", "R2") := .(fifelse(swap, R2, R1), fifelse(swap, R1, R2))
  ][, swap := NULL]

If you are certain that R2 %in% interesting is perfectly aligned with !R2 %in% interesting (that is, it is never the case that both R1 and R2 are interesting, ... or you don't care about swapping if both are interesting, as in rows 1 and 6), then you can simplify that down to

tmp[, c("R1", "R2") := .(
  fifelse(R2 %in% interesting, R2, R1),
  fifelse(R2 %in% interesting, R1, R2))]
  • Related