Perhaps this would be better presented as 2 questions.
The larger issue is that I'm writing a for loop (within another for loop) which subsets the dataframe into rows which are either equal to i, or return the row(s) with the latest information. For clarification's sake the for loop operates over a range (1:90) and there is no data at most iterations (i's). To account for this I've written an ifelse(is.na(),IF,ELSE), and while the ELSE condition seems to work, I'm struggling to code the IF.
Naturally, I'm presenting a simplified version:
df$days <- c(7, 17, 20, 22, 42, 55, 55, 82, 168, 251, 308)
for(i in 1:90)
{
latest <- ifelse(is.na(df$days[i])== TRUE,
subset(df, days == min(days >=i)),
subset(df, days == i))
}
Which brings me to, what I expect to be, the central issue. I've been playing around with min(), and it seems that it's here that my code has a problem:
i = 1
df$days
[1] 7 17 20 22 42 55 55 82 168 251 308
> min(df$days >= i)
[1] 1
My intention is to return the minimal value which is above i. So in this example, that would be 7. But instead, min(df$days >= i) returns i.
CodePudding user response:
I think this is a compact and robust way of doing what I think you're trying to do.
library(tidyverse)
lapply(
1:90,
\(i) return(df %>% filter(days <= i) %>% slice_max(days))
)
CodePudding user response:
The reason why you get 1 is because df$days >= i
returns a logical vector which is then interpreted as numeric for min
(TRUE
as 1 and FALSE
as 0):
df <- data.frame(days = c(7, 17, 20, 22, 42, 55, 55, 82, 168, 251, 308))
i <- 1
df$days >= i
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
as.numeric(df$days >= i)
#> [1] 1 1 1 1 1 1 1 1 1 1 1
min(df$days >= i)
#> [1] 1
Created on 2022-06-10 by the reprex package (v1.0.0)
As user2974951 mentioned, you need to use this logical vector to subset your original data:
min(df$days[df$days >= i])