Hey I have a data frame of 2 columns: date and cases:
date cases
2021-05-01 2
2022-03-01 3
2021-05-21 8
what I want is to calculate the total number per cases in each month of each year. I've found that code:
total_cases <- aggregate(cbind(cases)~month(date),
data=datalab,FUN=sum)
but it is not separated by year. it calculates to total cases for each month but combines the MAY of 2021 with MAY of 2022 together and I want that separately.
CodePudding user response:
Using months
(note the s), and strftime
.
with(datalab, aggregate(datalab['cases'], list(month=months(date), year=strftime(date, '%Y')), FUN=sum))
# month year cases
# 1 May 2021 10
# 2 March 2022 3
You could also use the 1st to 7th substr
ing.
aggregate(datalab['cases'], list(month=substr(datalab$date, 1, 7)), FUN=sum)
# month cases
# 1 2021-05 10
# 2 2022-03 3
Or, more concise, though not as nice.
aggregate(cases ~ substr(date, 1, 7), data=datalab, FUN=sum)
# substr(date, 1, 7) cases
# 1 2021-05 10
# 2 2022-03 3
Update
To also aggregate by country, just put it in the second (i.e. the by=
) list; it is referenced by the with
.
with(datalab2, aggregate(datalab2['cases'], list(month=months(date), year=strftime(date, '%Y'),
country=country), FUN=sum))
# month year country cases
# 1 May 2021 A 15
# 2 March 2022 A 5
# 3 May 2021 B 16
# 4 March 2022 B 6
# 5 May 2021 C 12
# 6 March 2022 C 5
Data:
datalab <- structure(list(date = c("2021-05-01", "2022-03-01", "2021-05-21"
), cases = c(2L, 3L, 8L)), class = "data.frame", row.names = c(NA,
-3L))
datalab <- transform(datalab, date=as.Date(date))
datalab2 <- structure(list(date = structure(c(18748, 19052, 18768, 18748,
19052, 18768, 18748, 19052, 18768), class = "Date"), country = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), levels = c("A", "B", "C"), class = "factor"),
cases = c(7L, 5L, 8L, 9L, 6L, 7L, 2L, 5L, 10L)), class = "data.frame", row.names = c(NA,
-9L))