identifying the position of the first minimum in each row of dataframe-CodePudding

I am working with many (100) large (50000, 150) dataframes. I am trying to identify the column number for the first minimum in each row.

These dataframes represent 50000 iterations of population simulations. The population initially recovers to equilibrium and then is subjected to a perturbation. The perturbation causes a population decline. After the perturbation ends the population begins to recover to equilibrium again.

I am trying to find at which year the first minimum of the population size occurs after the perturbation, for each iteration.

which.min does not work as there is stochasticity in the simulation so I cannot simply take the lowest value after the perturbation as that may not be the first minimum. For example, if a perturbation is introduced in year 6, the resulting population size vector could look like this:

x <- c(3, 4, 5, 6, 7, 8, 7, 6, 5, 4, 5, 6, 6, 5, 4, 2, 5, 6, 7)

which.min(x)

Does not work as this returns position 16 (value is 2). This is not the first minimum. I would like a function to tell me position 10 (that is the second 4) as that is the first minimum after the perturbation.

Even if I were to take which.min(x[7:16]) 6

I still get position 16, which is not correct. I cannot truncate the range of the vector on the other side, for example which.min(x[7:13]) 6, as the first minimum of some of the iterations are not within that range. I have tried several user-defined functions, but I can't quite get it.

Whatever the solution, it is probably necessary to use the year of perturbation (in this case 6) as the initial year, but I cannot figure out how to identify that first minimum.

CodePudding user response：

We can find when the values are decreasing and get the last decreased value of the first 'group'.

i1 <- rle(diff(x) < 0)
#position
sum(i1$lengths[1:which(i1$values)[1]])   1
[1] 10
#Minimum value
x[sum(i1$lengths[1:which(i1$values)[1]])   1]
[1] 4

To apply it rowwise on a data frame, we can wrap it in a function and apply, i.e.

f1 <- function(x){
  i1 <- rle(diff(x) < 0)
  return(sum(i1$lengths[1:which(i1$values)[1]])   1)
}

and then

apply(df, 1, f1)

CodePudding user response：

You can get the first value whose lead and lag values are higher:

library(dplyr)
x <- c(3, 4, 5, 6, 7, 8, 7, 6, 5, 4, 5, 6, 6, 5, 4, 2, 5, 6, 7)

firstMin <- function(x) first(which(lag(x, default = -Inf) > x & lead(x, default = Inf) > x))
firstMin(x)
# [1] 10

For multiple rows:

x <- c(3, 4, 5, 6, 7, 8, 7, 6, 5, 4, 5, 6, 6, 5, 4, 2, 5, 6, 7)
y <- c(3, 4, 5, 3, 7, 8, 7, 6, 5, 4, 5, 6, 6, 5, 4, 2, 5, 6, 7)
data <- data.frame(rbind(x,y))

data %>%
  rowwise() %>% 
  mutate(first = names(.)[firstMin(c_across())],
         value = get(first), .before = X1)

  first value    X1    X2    X3    X4    X5    X6    X7    X8    X9   X10   X11   X12   X13   X14   X15   X16   X17   X18   X19
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 X10       4     3     4     5     6     7     8     7     6     5     4     5     6     6     5     4     2     5     6     7
2 X4        3     3     4     5     3     7     8     7     6     5     4     5     6     6     5     4     2     5     6     7