How to replace a number with NA in R but add the replaced number to a different variable?-CodePudding

I have some data:

set.seed(565)
df <- data.frame(rs1 = rnorm(100, mean = 50, sd = 3), rs2 = rnorm(100, mean = 4, sd = 0.2), rs3 = rnorm(100, mean = 15, sd = 1),
                 rs4 = rnorm(100, mean = 2, sd = 0.04))

I want to replace any number less than 3 in this dataframe with NA, but make sure that the number that was replaced is added to the variable with the highest number in that row (so the row total does not change). E.G. for row 1, which looks like:

50.92756   4.033628   14.36690   1.999160

after the command should look like:

52.92672   4.033628   14.36690   NA

CodePudding user response：

Here is a base R way.

set.seed(565)
df <- data.frame(rs1 = rnorm(100, mean = 50, sd = 3), rs2 = rnorm(100, mean = 4, sd = 0.2), rs3 = rnorm(100, mean = 15, sd = 1),
                 rs4 = rnorm(100, mean = 2, sd = 0.04))


df[] <- t(apply(df, 1, \(x) {
  i <- which.max(x)
  j <- x < 3
  if(any(j)) {
    x[i] <- x[i]   sum(x[which(j)])
    is.na(x) <- j
  }
  x
}))
head(df, n = 10)
#>         rs1      rs2      rs3 rs4
#> 1  52.92672 4.033628 14.36690  NA
#> 2  52.82045 4.088581 12.49494  NA
#> 3  53.94117 3.635854 15.17427  NA
#> 4  49.97355 4.076953 15.06030  NA
#> 5  53.17885 4.020831 13.92384  NA
#> 6  55.86003 4.064562 14.37932  NA
#> 7  51.34426 4.163213 14.22895  NA
#> 8  56.79130 4.029414 14.90220  NA
#> 9  52.20528 4.135510 15.69041  NA
#> 10 52.70072 4.250440 15.14747  NA

^{Created on 2022-05-26 by the reprex package (v2.0.1)}

CodePudding user response：

Note that this will work whether all the row is less than 3 or not

idx <- cbind(seq(nrow(df)), max.col(df))
df[idx] <- df[idx] * NA ^ (df[idx] < 3)  rowSums(df * (df < 3))
is.na(df) <- df < 3
df

        rs1      rs2      rs3 rs4
1   52.92672 4.033628 14.36690  NA
2   52.82045 4.088581 12.49494  NA
3   53.94117 3.635854 15.17427  NA
4   49.97355 4.076953 15.06030  NA
5   53.17885 4.020831 13.92384  NA
6   55.86003 4.064562 14.37932  NA
7   51.34426 4.163213 14.22895  NA
8   56.79130 4.029414 14.90220  NA

CodePudding user response：

Something like this would work.

set.seed(565)

df <- data.frame(rs1 = rnorm(100, mean = 50, sd = 3), rs2 = rnorm(100, mean = 4, sd = 0.2), rs3 = rnorm(100, mean = 15, sd = 1),
                 rs4 = rnorm(100, mean = 2, sd = 0.04))

min_index <- df == apply(df, 2, \(x) x <= 3)

max_index <- df == apply(df, 1, max)

df[max_index] <- df[max_index]   df[min_index]

df[min_index] <- NA

df

#         rs1      rs2      rs3 rs4
#1   52.92672 4.033628 14.36690  NA
#2   52.82045 4.088581 12.49494  NA