Home > Enterprise >  How to compute total number per each month and year?
How to compute total number per each month and year?

Time:05-12

Hey I have a data frame of 2 columns: date and cases:

 date            cases
2021-05-01          2
2022-03-01          3
2021-05-21          8

what I want is to calculate the total number per cases in each month of each year. I've found that code:

 total_cases <- aggregate(cbind(cases)~month(date),
         data=datalab,FUN=sum)

but it is not separated by year. it calculates to total cases for each month but combines the MAY of 2021 with MAY of 2022 together and I want that separately.

CodePudding user response:

Using months (note the s), and strftime.

with(datalab, aggregate(datalab['cases'], list(month=months(date), year=strftime(date, '%Y')), FUN=sum))
#   month year cases
# 1   May 2021    10
# 2 March 2022     3

You could also use the 1st to 7th substring.

aggregate(datalab['cases'], list(month=substr(datalab$date, 1, 7)), FUN=sum)
#     month cases
# 1 2021-05    10
# 2 2022-03     3

Or, more concise, though not as nice.

aggregate(cases ~ substr(date, 1, 7), data=datalab, FUN=sum)
#   substr(date, 1, 7) cases
# 1            2021-05    10
# 2            2022-03     3

Update

To also aggregate by country, just put it in the second (i.e. the by=) list; it is referenced by the with.

with(datalab2, aggregate(datalab2['cases'], list(month=months(date), year=strftime(date, '%Y'),
                                                 country=country), FUN=sum))
#   month year country cases
# 1   May 2021       A    15
# 2 March 2022       A     5
# 3   May 2021       B    16
# 4 March 2022       B     6
# 5   May 2021       C    12
# 6 March 2022       C     5

Data:

datalab <- structure(list(date = c("2021-05-01", "2022-03-01", "2021-05-21"
), cases = c(2L, 3L, 8L)), class = "data.frame", row.names = c(NA, 
-3L))

datalab <- transform(datalab, date=as.Date(date))

datalab2 <- structure(list(date = structure(c(18748, 19052, 18768, 18748, 
19052, 18768, 18748, 19052, 18768), class = "Date"), country = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), levels = c("A", "B", "C"), class = "factor"), 
    cases = c(7L, 5L, 8L, 9L, 6L, 7L, 2L, 5L, 10L)), class = "data.frame", row.names = c(NA, 
-9L))
  • Related