I am wondering how I can find the sum of a column, (in this case it's the AgeGroup_20_to_24 column) for a month and year. Here's the sample data:
https://i.stack.imgur.com/E23Th.png
I essentially want to find the total amount of cases per month/year. For an example: 01/2020 = total sum cases of the AgeGroup 02/2020 = total sum cases of the AgeGroup
I tried doing this, however I get this:
https://i.stack.imgur.com/1eH0O.png
xAge20To24 <- covid%>%
mutate(dates=mdy(Date), year = year(dates), month = month(dates))%>%
mutate(total = sum(AgeGroup_20_to_24))%>%
select(Date, year, month, AgeGroup_20_to_24)%>%
group_by(year)
View(xAge20To24)
Any help will be appreciated.
structure(list(Date = c("3/9/2020", "3/10/2020", "3/11/2020", "3/12/2020", "3/13/2020", "3/14/2020"), AgeGroup_0_to_19 = c(1, 0, 2, 0, 0, 2), AgeGroup_20_to_24 = c(1, 0, 2, 0, 2, 1), AgeGroup_25_to_29 = c(1, 0, 1, 2, 2, 2), AgeGroup_30_to_34 = c(0, 0, 2, 3, 4, 3), AgeGroup_35_to_39 = c(3, 1, 2, 1, 2, 1), AgeGroup_40_to_44 = c(1, 2, 1, 3, 3, 1), AgeGroup_45_to_49 = c(1, 0, 0, 2, 0, 1), AgeGroup_50_to_54 = c(2, 1, 1, 1, 0, 1), AgeGroup_55_to_59 = c(1, 0, 1, 1, 1, 2), AgeGroup_60_to_64 = c(0, 2, 2, 1, 1, 3), AgeGroup_70_plus = c(2, 0, 2, 0, 0, 0)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
I'm not sure if your question and your data match up. You're asking for by-month summaries of data, but your data only includes March entries. I've provided two examples of summarizing your data below, one that uses the entire date and one that uses by-day summaries since we can't use month. If your full data set has more months included, you can just swap the day for month instead. First, a quick summary of just the dates can be done with this code:
#### Load Library ####
library(tidyverse)
library(lubridate)
#### Pivot and Summarise Data ####
covid %>%
pivot_longer(cols = c(everything(),
-Date),
names_to = "AgeGroup",
values_to = "Cases") %>%
group_by(Date) %>%
summarise(Sum_Cases = sum(Cases))
This pivots your data into long format, groups by the entire date, then summarizes the cases, which gives you this by-date sum of data:
# A tibble: 6 × 2
Date Sum_Cases
<chr> <dbl>
1 3/10/2020 6
2 3/11/2020 16
3 3/12/2020 14
4 3/13/2020 15
5 3/14/2020 17
6 3/9/2020 13
Using the same pivot_longer
principle, you can mutate the data to date format like you already did, pivot to longer format, then group by day, thereafter summarizing the cases:
#### Theoretical Example ####
covid %>%
mutate(Date=mdy(Date),
Year = year(Date),
Month = month(Date),
Day = day(Date)) %>%
pivot_longer(cols = c(everything(),
-Date,-Year,-Month,-Day),
names_to = "AgeGroup",
values_to = "Cases") %>%
group_by(Day) %>% # use by day instead of month
summarise(Sum_Cases = sum(Cases))
Which you can see below. Here we can see the 14th had the most cases:
# A tibble: 6 × 2
Day Sum_Cases
<int> <dbl>
1 9 13
2 10 6
3 11 16
4 12 14
5 13 15
6 14 17