ifelse with sorted values by row-CodePudding

I have a dataframe (example data):

id <- c(1, 2, 3)
ex1 <- c(0.8,   0.2, 0.3)
ex2 <- c(0.1,   0.4, 0.04)
ex3 <- c(0.04,  0.3, 0.5)
ex <- c(1, 1, 1)
ran <- c(0.5, 0.7, 0.6)
dat <- data.frame(id, ex1, ex2, ex3, ex, ran)

dat
  id ex1  ex2  ex3 ex ran
1  1 0.8 0.10 0.04  1 0.5
2  2 0.2 0.40 0.30  1 0.7
3  3 0.3 0.04 0.50  1 0.6

I want to changes the values of "ex" with an if-else-condition. "ex" should change to 5 (arbitrary) when "ran" is smaller or equal then the highest value for the ex$-variables. It should be greater then the other ex$-variables aswell, but they should be sorted - the second largest value added to the smallest value. Here are examples for all id's, beginning with id 1:

dat$ex <- ifelse(dat$ran <= dat$ex1 & dat$ran > dat$ex1   dat$ex2, 5, dat$ex)

Here, ex1 is the largest value, followed by ex1 and ex2.

For id 2, it should be:

dat$ex <- ifelse(dat$ran <= dat$ex2 & dat$ran > dat$ex3   dat$ex1, 5, dat$ex)

Here, ex2 is the largest value, followed by ex3 and then ex1.

For id 3:

dat$ex <- ifelse(dat$ran <= dat$ex3 & dat$ran > dat$ex1   dat$ex2, 5, dat$ex)

Here, ex3 is the largest value, followed by ex1 and then ex2.

Now to the problem: How to generalize the ifelse-statement? Note: It is important that the summation of the two smaller values is performed as implemented in the examples. I need to identify the sorted values for ex1, ex2 and ex3 within ifelse by id.

CodePudding user response：

exes <- t(apply(subset(dat, select = grep("^ex. ", names(dat))), 1, function(z) c(max(z), sum(z[-which.max(z)]))))
exes
#   [,1] [,2]
# 1  0.8 0.14
# 2  0.4 0.50
# 3  0.5 0.34

ifelse(dat$ran <= exes[,1] & dat$ran > exes[,2], 5, dat$ran)
#   1   2   3 
# 5.0 0.7 0.6

Walk-through:

subset(dat, ...) is a way to dynamically extract columns from a frame regardless of its type (e.g., data.frame, tbl_df, or data.table), and without risk of dropping the frame to a column (i.e., see that mtcars[,2] is no longer a frame); there are other ways to do this, some in base R, some in other packages like dplyr or data.table
apply(dat, 1, ..) operates on the rows of the respective columns; because when MARGIN=1 (second arg), it transposes the results, so we need to t(.) it back into the right shape;
exes is now a matrix whose first column contains the max of the ex# variables, and the second column contains the sum of the non-max ex# variables

From here, I think the use of exes is the "general" solution you were looking for.

CodePudding user response：

Here is a way how we could achieve the task using dplyr and tidyr:

library(dplyr)
library(tidyr)
dat %>% 
  pivot_longer(
    cols = ex1:ex3
  ) %>% 
  arrange(id, desc(value)) %>% 
  group_by(id) %>% 
  mutate(ex = ifelse(ran <= value[1] & ran > sum(value[2], value[3]), 5, ex)) %>% 
  pivot_wider(
    names_from=name
  )

output:

 id    ex   ran   ex1   ex2   ex3
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     5   0.5   0.8  0.1   0.04
2     2     1   0.7   0.2  0.4   0.3 
3     3     1   0.6   0.3  0.04  0.5