Finding the location of half the max value in a column-CodePudding

I have a list of data frames, and I'd like to apply a function to that list to find the location in the "julian" column that corresponds to half the max value in the "total_cover" column. Here's some data that represents the data I have:

df1 <- data.frame(julian = c(81,85,88,97,101,104,126,167),
                  total_cover = c(43,52,75,92,94,97,188,172))
df2 <- data.frame(julian = c(81,85,88,97,101,104,126,167),
                  total_cover = c(30,55,73,80,75,85,138,154))
df3 <- data.frame(julian = c(107,111,115,119,123,129,131,133,135,137),
                  total_cover = c(36,41,43,47,55,55,55,65,75,80))

data.list <- list(df1=df1,df2=df2,df3=df3)

The code below is what I've tried, but I'm not getting the correct output. This doesn't seem to be finding the julian day that corresponds to half the max value

unlist(lapply(X = data.list, FUN = function(x){
        x[which.max(x[["total_cover"]] >= which.max(x[["total_cover"]])/2), "julian"]
}))

output:
df1  df2  df3
81   81   107

My ideal output would be what's shown below, with the julian dates that correspond to >= max(total_cover)/2

df1  df2  df3
101  97   111

Using R 4.2.2

CodePudding user response：

find_julian <- function(df){
  #calculate the distance from half of the maximum
  distance <- df[["total_cover"]]- max(df[["total_cover"]])/2
  #find smallest value greater than half of the maximum and select corresponding julian
  df[distance==min(distance[distance>0]),"julian"]
}

unlist(lapply(X = data.list, FUN = find_julian))
df1 df2 df3 
104  97 111

CodePudding user response：

I believe the following answers the question.

sapply(data.list, \(x) {
  half_max <- max(x$total_cover)/2
  d <- abs(x$total_cover - half_max)
  is.na(d) <- x$total_cover < half_max
  x$julian[which.min(d)]
})
#> df1 df2 df3 
#> 101  97 111

^{Created on 2022-12-13 with reprex v2.0.2}

CodePudding user response：

Here is step by step dplyr solution: The main issue is that the difference is sometimes negative and we have to remove them:

The result of

df1 df2 df3 
 81  81 107

occurs because the code does not take into consideration negative numbers!

Long version:

library(dplyr)

bind_rows(data.list, .id = 'id') %>% 
  group_by(id) %>% 
  mutate(x = (max(total_cover)/2)) %>% 
  mutate(y = total_cover-x) %>% 
  filter(y >=0) %>% 
  filter(y == min(y)) %>% 
  select(1:2) %>% 
  pull(julian, name = id)

Or a little shorter:

bind_rows(data.list, .id = 'id') %>% 
  group_by(id) %>% 
  filter(total_cover-(max(total_cover)/2) >=0) %>% 
  filter(total_cover == min(total_cover)) %>% 
  select(1:2) %>% 
  pull(julian, name = id)

result:

df1 df2 df3 
101  97 111