Home > front end >  How to find sum of a column given the date and month is the same
How to find sum of a column given the date and month is the same

Time:11-04

I am wondering how I can find the sum of a column, (in this case it's the AgeGroup_20_to_24 column) for a month and year. Here's the sample data:

https://i.stack.imgur.com/E23Th.png

I essentially want to find the total amount of cases per month/year. For an example: 01/2020 = total sum cases of the AgeGroup 02/2020 = total sum cases of the AgeGroup

I tried doing this, however I get this:

https://i.stack.imgur.com/1eH0O.png

xAge20To24 <- covid%>%
  mutate(dates=mdy(Date), year = year(dates), month = month(dates))%>%
  mutate(total = sum(AgeGroup_20_to_24))%>%
  select(Date, year, month, AgeGroup_20_to_24)%>%
  group_by(year)

View(xAge20To24)
  

Any help will be appreciated.

structure(list(Date = c("3/9/2020", "3/10/2020", "3/11/2020", "3/12/2020", "3/13/2020", "3/14/2020"), AgeGroup_0_to_19 = c(1, 0, 2, 0, 0, 2), AgeGroup_20_to_24 = c(1, 0, 2, 0, 2, 1), AgeGroup_25_to_29 = c(1, 0, 1, 2, 2, 2), AgeGroup_30_to_34 = c(0, 0, 2, 3, 4, 3), AgeGroup_35_to_39 = c(3, 1, 2, 1, 2, 1), AgeGroup_40_to_44 = c(1, 2, 1, 3, 3, 1), AgeGroup_45_to_49 = c(1, 0, 0, 2, 0, 1), AgeGroup_50_to_54 = c(2, 1, 1, 1, 0, 1), AgeGroup_55_to_59 = c(1, 0, 1, 1, 1, 2), AgeGroup_60_to_64 = c(0, 2, 2, 1, 1, 3), AgeGroup_70_plus = c(2, 0, 2, 0, 0, 0)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

I'm not sure if your question and your data match up. You're asking for by-month summaries of data, but your data only includes March entries. I've provided two examples of summarizing your data below, one that uses the entire date and one that uses by-day summaries since we can't use month. If your full data set has more months included, you can just swap the day for month instead. First, a quick summary of just the dates can be done with this code:

#### Load Library ####
library(tidyverse)
library(lubridate)

#### Pivot and Summarise Data ####
covid %>%   
  pivot_longer(cols = c(everything(),
                        -Date),
               names_to = "AgeGroup",
               values_to = "Cases") %>% 
  group_by(Date) %>% 
  summarise(Sum_Cases = sum(Cases))

This pivots your data into long format, groups by the entire date, then summarizes the cases, which gives you this by-date sum of data:

# A tibble: 6 × 2
  Date      Sum_Cases
  <chr>         <dbl>
1 3/10/2020         6
2 3/11/2020        16
3 3/12/2020        14
4 3/13/2020        15
5 3/14/2020        17
6 3/9/2020         13

Using the same pivot_longer principle, you can mutate the data to date format like you already did, pivot to longer format, then group by day, thereafter summarizing the cases:

#### Theoretical Example ####
covid %>%
  mutate(Date=mdy(Date), 
         Year = year(Date),
         Month = month(Date),
         Day = day(Date)) %>% 
  pivot_longer(cols = c(everything(),
                        -Date,-Year,-Month,-Day),
               names_to = "AgeGroup",
               values_to = "Cases") %>% 
  group_by(Day) %>% # use by day instead of month
  summarise(Sum_Cases = sum(Cases))

Which you can see below. Here we can see the 14th had the most cases:

# A tibble: 6 × 2
    Day Sum_Cases
  <int>     <dbl>
1     9        13
2    10         6
3    11        16
4    12        14
5    13        15
6    14        17
  •  Tags:  
  • r
  • Related