I am trying to count the number of rows in df1, which contains the date of an event,
df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))
that fall within the start and end dates of df2,
df2 = data.frame(Id = c(1,2),
Start = c("2021-06-01", "2021-08-01"),
End = c("2021-08-15", "2021-09-15"))
In this example, the output would look like
Id Start End Count
1 1 2021-06-01 2021-08-15 3
2 2 2021-08-01 2021-09-15 3
I have tried similar examples How to get the number of counts between two dates in R? and count row if date falls within date range for all dates in series in R without any success.
Any help or suggestions would be greatly appreciated. Thank you!
CodePudding user response:
Please note: should Id 1 count be 4 in your expected output?
You can group_by
your data and sum the dates that fall %within%
the interval
like this:
df1 = data.frame(date = c("2021-07-31", "2021-08-01", "2021-08-12", "2021-08-14"))
df2 = data.frame(Id = c(1,2),
Start = c("2021-06-01", "2021-08-01"),
End = c("2021-08-15", "2021-09-15"))
library(dplyr)
library(lubridate)
df2 %>%
group_by(Id) %>%
mutate(Count = sum(as_date(df1$date) %within% lubridate::interval(Start, End)))
#> # A tibble: 2 × 4
#> # Groups: Id [2]
#> Id Start End Count
#> <dbl> <chr> <chr> <int>
#> 1 1 2021-06-01 2021-08-15 4
#> 2 2 2021-08-01 2021-09-15 3
Created on 2022-07-12 by the reprex package (v2.0.1)
CodePudding user response:
Using data.table::between
in outer
.
f <- Vectorize(\(i, j) data.table::between(df1[i, 1L], df2[j, 2], df2[j, 3]))
transform(df2, count=colSums(outer(seq_len(nrow(df1)), seq_len(nrow(df2)), f)))
# Id Start End count
# 1 1 2021-06-01 2021-08-15 4
# 2 2 2021-08-01 2021-09-15 3
Note, that "Date"
format is required, so you may want to do this beforehand:
df1[] <- lapply(df1, as.Date)
df2[-1] <- lapply(df2[-1], as.Date)
Data:
df1 <- structure(list(date = structure(c(18839, 18840, 18851, 18853), class = "Date")), row.names = c(NA,
-4L), class = "data.frame")
df2 <- structure(list(Id = c(1, 2), Start = structure(c(18779, 18840
), class = "Date"), End = structure(c(18854, 18885), class = "Date")), row.names = c(NA,
-2L), class = "data.frame")
CodePudding user response:
Or with base
:
df2$Count <- apply(df2, 1, function(x) sum(as.Date(df1$date) %in% seq(as.Date(x["Start"]), as.Date(x["End"]), by = "1 day")))
Output:
Id Start End Count
1 1 2021-06-01 2021-08-15 4
2 2 2021-08-01 2021-09-15 3