I want to get the count of cases per day, including zeros. This is my data frame example:
set.seed(1453); ID = sample(1:4, 10, TRUE)
date = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20',
'2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31')
df = data.frame(ID, date = as.Date(date))
So I know that there was one case for ID
1
on 2016-01-01
, then one case for ID
1
on 2016-01-20
. So I want to get a data frame from 2016-01-01
to 2016-01-31
with 1
on those two days and 0
otherwise. I will like the same for each ID
. So this example shows one event per ID
, but I have up to 15 cases per day per ID
in my actual data frame.
I have used:
M <- function(timeStamps) {
Dates <- as.Date(strftime(df$date, "%Y-%m-%d"))
allDates <- seq(from = min(Dates), to = max(Dates), by = "day")
Admission <- sapply(allDates, FUN = function(X) sum(Dates == X))
data.frame(day = allDates, Admission = Admission)
}
MM<-M(df$date)
But MM
will only give me the result I want if I create a data frame for each ID
.
I have done the same exercise using this example, but I get monthly aggregate results here. Ideally, I would be able to aggregate a similar data frame per day, considering 0
events per ID
.
CodePudding user response:
With the current function, we can split
the 'date' by 'ID' column, apply the function, and rbind
the list
output to a single data.frame with ID
as another column
lst1 <- lapply(split(df$date, df$ID), M)
out <- do.call(rbind, Map(cbind, ID = names(lst1), lst1))
row.names(out) <- NULL
-output
> str(out)
'data.frame': 124 obs. of 3 variables:
$ ID : chr "1" "1" "1" "1" ...
$ day : Date, format: "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" ...
$ Admission: int 1 0 0 0 1 0 1 0 0 0 ...
> head(out)
ID day Admission
1 1 2016-01-01 1
2 1 2016-01-02 0
3 1 2016-01-03 0
4 1 2016-01-04 0
5 1 2016-01-05 1
6 1 2016-01-06 0
Or using tidyverse
, do a group by operation
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
summarise(out = M(date), .groups = 'drop') %>%
unpack(out)
-output
# A tibble: 124 × 3
ID day Admission
<int> <date> <int>
1 1 2016-01-01 1
2 1 2016-01-02 0
3 1 2016-01-03 0
4 1 2016-01-04 0
5 1 2016-01-05 1
6 1 2016-01-06 0
7 1 2016-01-07 1
8 1 2016-01-08 0
9 1 2016-01-09 0
10 1 2016-01-10 0
# … with 114 more rows