Home > Mobile >  Aggregate per day counting different IDs in R
Aggregate per day counting different IDs in R

Time:04-28

I want to get the count of cases per day, including zeros. This is my data frame example:

set.seed(1453); ID = sample(1:4, 10, TRUE)
date = c('2016-01-01', '2016-01-05', '2016-01-07',  '2016-01-12',  '2016-01-16',  '2016-01-20',
         '2016-01-20',  '2016-01-25',  '2016-01-26',  '2016-01-31')
df = data.frame(ID, date = as.Date(date))

So I know that there was one case for ID 1 on 2016-01-01, then one case for ID 1 on 2016-01-20. So I want to get a data frame from 2016-01-01 to 2016-01-31 with 1 on those two days and 0 otherwise. I will like the same for each ID. So this example shows one event per ID, but I have up to 15 cases per day per ID in my actual data frame.

I have used:

M <- function(timeStamps) {
  Dates <- as.Date(strftime(df$date, "%Y-%m-%d"))
  allDates <- seq(from = min(Dates), to = max(Dates), by = "day")
  Admission <- sapply(allDates, FUN = function(X) sum(Dates == X))
  data.frame(day = allDates, Admission = Admission)
}
MM<-M(df$date)

But MM will only give me the result I want if I create a data frame for each ID.

I have done the same exercise using this example, but I get monthly aggregate results here. Ideally, I would be able to aggregate a similar data frame per day, considering 0 events per ID.

CodePudding user response:

With the current function, we can split the 'date' by 'ID' column, apply the function, and rbind the list output to a single data.frame with ID as another column

lst1 <- lapply(split(df$date, df$ID), M)
out <- do.call(rbind, Map(cbind, ID = names(lst1), lst1))
row.names(out) <- NULL

-output

> str(out)
'data.frame':   124 obs. of  3 variables:
 $ ID       : chr  "1" "1" "1" "1" ...
 $ day      : Date, format: "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" ...
 $ Admission: int  1 0 0 0 1 0 1 0 0 0 ...
> head(out)
  ID        day Admission
1  1 2016-01-01         1
2  1 2016-01-02         0
3  1 2016-01-03         0
4  1 2016-01-04         0
5  1 2016-01-05         1
6  1 2016-01-06         0

Or using tidyverse, do a group by operation

library(dplyr)
library(tidyr)
df %>%
  group_by(ID) %>% 
  summarise(out = M(date), .groups = 'drop') %>%
  unpack(out)

-output

# A tibble: 124 × 3
      ID day        Admission
   <int> <date>         <int>
 1     1 2016-01-01         1
 2     1 2016-01-02         0
 3     1 2016-01-03         0
 4     1 2016-01-04         0
 5     1 2016-01-05         1
 6     1 2016-01-06         0
 7     1 2016-01-07         1
 8     1 2016-01-08         0
 9     1 2016-01-09         0
10     1 2016-01-10         0
# … with 114 more rows
  • Related