I have some data:
set.seed(565)
df <- data.frame(rs1 = rnorm(100, mean = 50, sd = 3), rs2 = rnorm(100, mean = 4, sd = 0.2), rs3 = rnorm(100, mean = 15, sd = 1),
rs4 = rnorm(100, mean = 2, sd = 0.04))
I want to replace any number less than 3 in this dataframe with NA, but make sure that the number that was replaced is added to the variable with the highest number in that row (so the row total does not change). E.G. for row 1, which looks like:
50.92756 4.033628 14.36690 1.999160
after the command should look like:
52.92672 4.033628 14.36690 NA
CodePudding user response:
Here is a base R way.
set.seed(565)
df <- data.frame(rs1 = rnorm(100, mean = 50, sd = 3), rs2 = rnorm(100, mean = 4, sd = 0.2), rs3 = rnorm(100, mean = 15, sd = 1),
rs4 = rnorm(100, mean = 2, sd = 0.04))
df[] <- t(apply(df, 1, \(x) {
i <- which.max(x)
j <- x < 3
if(any(j)) {
x[i] <- x[i] sum(x[which(j)])
is.na(x) <- j
}
x
}))
head(df, n = 10)
#> rs1 rs2 rs3 rs4
#> 1 52.92672 4.033628 14.36690 NA
#> 2 52.82045 4.088581 12.49494 NA
#> 3 53.94117 3.635854 15.17427 NA
#> 4 49.97355 4.076953 15.06030 NA
#> 5 53.17885 4.020831 13.92384 NA
#> 6 55.86003 4.064562 14.37932 NA
#> 7 51.34426 4.163213 14.22895 NA
#> 8 56.79130 4.029414 14.90220 NA
#> 9 52.20528 4.135510 15.69041 NA
#> 10 52.70072 4.250440 15.14747 NA
Created on 2022-05-26 by the reprex package (v2.0.1)
CodePudding user response:
Note that this will work whether all the row is less than 3 or not
idx <- cbind(seq(nrow(df)), max.col(df))
df[idx] <- df[idx] * NA ^ (df[idx] < 3) rowSums(df * (df < 3))
is.na(df) <- df < 3
df
rs1 rs2 rs3 rs4
1 52.92672 4.033628 14.36690 NA
2 52.82045 4.088581 12.49494 NA
3 53.94117 3.635854 15.17427 NA
4 49.97355 4.076953 15.06030 NA
5 53.17885 4.020831 13.92384 NA
6 55.86003 4.064562 14.37932 NA
7 51.34426 4.163213 14.22895 NA
8 56.79130 4.029414 14.90220 NA
CodePudding user response:
Something like this would work.
set.seed(565)
df <- data.frame(rs1 = rnorm(100, mean = 50, sd = 3), rs2 = rnorm(100, mean = 4, sd = 0.2), rs3 = rnorm(100, mean = 15, sd = 1),
rs4 = rnorm(100, mean = 2, sd = 0.04))
min_index <- df == apply(df, 2, \(x) x <= 3)
max_index <- df == apply(df, 1, max)
df[max_index] <- df[max_index] df[min_index]
df[min_index] <- NA
df
# rs1 rs2 rs3 rs4
#1 52.92672 4.033628 14.36690 NA
#2 52.82045 4.088581 12.49494 NA