I have a list of data frames, and I'd like to apply a function to that list to find the location in the "julian" column that corresponds to half the max value in the "total_cover" column. Here's some data that represents the data I have:
df1 <- data.frame(julian = c(81,85,88,97,101,104,126,167),
total_cover = c(43,52,75,92,94,97,188,172))
df2 <- data.frame(julian = c(81,85,88,97,101,104,126,167),
total_cover = c(30,55,73,80,75,85,138,154))
df3 <- data.frame(julian = c(107,111,115,119,123,129,131,133,135,137),
total_cover = c(36,41,43,47,55,55,55,65,75,80))
data.list <- list(df1=df1,df2=df2,df3=df3)
The code below is what I've tried, but I'm not getting the correct output. This doesn't seem to be finding the julian day that corresponds to half the max value
unlist(lapply(X = data.list, FUN = function(x){
x[which.max(x[["total_cover"]] >= which.max(x[["total_cover"]])/2), "julian"]
}))
output:
df1 df2 df3
81 81 107
My ideal output would be what's shown below, with the julian dates that correspond to >= max(total_cover)/2
df1 df2 df3
101 97 111
Using R 4.2.2
CodePudding user response:
find_julian <- function(df){
#calculate the distance from half of the maximum
distance <- df[["total_cover"]]- max(df[["total_cover"]])/2
#find smallest value greater than half of the maximum and select corresponding julian
df[distance==min(distance[distance>0]),"julian"]
}
unlist(lapply(X = data.list, FUN = find_julian))
df1 df2 df3
104 97 111
CodePudding user response:
I believe the following answers the question.
sapply(data.list, \(x) {
half_max <- max(x$total_cover)/2
d <- abs(x$total_cover - half_max)
is.na(d) <- x$total_cover < half_max
x$julian[which.min(d)]
})
#> df1 df2 df3
#> 101 97 111
Created on 2022-12-13 with reprex v2.0.2
CodePudding user response:
Here is step by step dplyr
solution: The main issue is that the difference is sometimes negative and we have to remove them:
The result of
df1 df2 df3
81 81 107
occurs because the code does not take into consideration negative numbers!
Long version:
library(dplyr)
bind_rows(data.list, .id = 'id') %>%
group_by(id) %>%
mutate(x = (max(total_cover)/2)) %>%
mutate(y = total_cover-x) %>%
filter(y >=0) %>%
filter(y == min(y)) %>%
select(1:2) %>%
pull(julian, name = id)
Or a little shorter:
bind_rows(data.list, .id = 'id') %>%
group_by(id) %>%
filter(total_cover-(max(total_cover)/2) >=0) %>%
filter(total_cover == min(total_cover)) %>%
select(1:2) %>%
pull(julian, name = id)
result:
df1 df2 df3
101 97 111