Home > Enterprise >  Count the number of entries that fall within a range of dates in a separate dataframe in R
Count the number of entries that fall within a range of dates in a separate dataframe in R

Time:07-13

I am trying to count the number of rows in df1, which contains the date of an event,

df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))

that fall within the start and end dates of df2,

df2 = data.frame(Id = c(1,2), 
                 Start = c("2021-06-01", "2021-08-01"), 
                 End = c("2021-08-15", "2021-09-15"))

In this example, the output would look like

  Id      Start        End Count
1  1 2021-06-01 2021-08-15     3
2  2 2021-08-01 2021-09-15     3

I have tried similar examples How to get the number of counts between two dates in R? and count row if date falls within date range for all dates in series in R without any success.

Any help or suggestions would be greatly appreciated. Thank you!

CodePudding user response:

Please note: should Id 1 count be 4 in your expected output?

You can group_by your data and sum the dates that fall %within% the interval like this:

df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))
df2 = data.frame(Id = c(1,2), 
                 Start = c("2021-06-01", "2021-08-01"), 
                 End = c("2021-08-15", "2021-09-15"))

library(dplyr)
library(lubridate)
df2 %>%
  group_by(Id) %>%
  mutate(Count = sum(as_date(df1$date) %within% lubridate::interval(Start, End)))
#> # A tibble: 2 × 4
#> # Groups:   Id [2]
#>      Id Start      End        Count
#>   <dbl> <chr>      <chr>      <int>
#> 1     1 2021-06-01 2021-08-15     4
#> 2     2 2021-08-01 2021-09-15     3

Created on 2022-07-12 by the reprex package (v2.0.1)

CodePudding user response:

Using data.table::between in outer.

f <- Vectorize(\(i, j) data.table::between(df1[i, 1L], df2[j, 2], df2[j, 3]))
transform(df2, count=colSums(outer(seq_len(nrow(df1)), seq_len(nrow(df2)), f)))
#   Id      Start        End count
# 1  1 2021-06-01 2021-08-15     4
# 2  2 2021-08-01 2021-09-15     3

Note, that "Date" format is required, so you may want to do this beforehand:

df1[] <- lapply(df1, as.Date)
df2[-1] <- lapply(df2[-1], as.Date)

Data:

df1 <- structure(list(date = structure(c(18839, 18840, 18851, 18853), class = "Date")), row.names = c(NA, 
-4L), class = "data.frame")

df2 <- structure(list(Id = c(1, 2), Start = structure(c(18779, 18840
), class = "Date"), End = structure(c(18854, 18885), class = "Date")), row.names = c(NA, 
-2L), class = "data.frame")

CodePudding user response:

Or with base:

df2$Count <- apply(df2, 1, function(x) sum(as.Date(df1$date) %in% seq(as.Date(x["Start"]), as.Date(x["End"]), by = "1 day")))

Output:

  Id      Start        End Count
1  1 2021-06-01 2021-08-15     4
2  2 2021-08-01 2021-09-15     3
  •  Tags:  
  • r
  • Related