Home > database >  Summing multiple items in a column
Summing multiple items in a column

Time:12-04

Type                Date        Cost
Shampoo          01/31/2022     $10
Shampoo          01/31/2022     $15
Shampoo          02/22/2019     $15
Conditioner      03/15/2020     $17
Conditioner      05/16/2022     $19
Soap.            01/31/2021     $5
Soap             01/06/2022     $2
Soap             12/31/2019     $3
Soap             10/10/2022     $5

How would I approach summing total cost for specific items in a year, months, quarter and total cost

Example Output:

Type     | Number Items | Year | Total Cost 
Shampoo  |   2          | 2022 |    25
Shampoo  |  1.          | 2019 |   15

etc...

split by month, and quarter

Trying summarize and library(lubridate)

CodePudding user response:

library(tidyverse)
library(lubridate)

df %>%
   group_by(Type, Date)%>%
   summarise(Number_Items = n(),
             Year = year(mdy(Date[1])),
             Total_Cost = sum(parse_number(Cost)),
             .groups = 'drop')

# A tibble: 8 × 5
  Type        Date       Number_Items  Year Total_Cost
  <chr>       <chr>             <int> <dbl>      <dbl>
1 Conditioner 03/15/2020            1  2020         17
2 Conditioner 05/16/2022            1  2022         19
3 Shampoo     01/31/2022            2  2022         25
4 Shampoo     02/22/2019            1  2019         15
5 Soap        01/06/2022            1  2022          2
6 Soap        10/10/2022            1  2022          5
7 Soap        12/31/2019            1  2019          3
8 Soap.       01/31/2021            1  2021          5

CodePudding user response:

You have to group your data by type and year, and then count the number of items and total cost, here an example for you to adapt.

library(dplyr)
library(lubridate)

your_data_frame %>% 
  group_by(type, year = year(dmy(Date))) %>% 
  summarise(
    number_of_items = n(),
    total_cost = sum(cost,na.rm = TRUE)
  )

CodePudding user response:

This one is very similar to @onyambu's solution. But it differs in the grouping:

library(dplyr)
library(readr) # parse_number()
library(lubridate

df %>% 
  mutate(Year = year(mdy(Date))) %>% 
  group_by(Year, Type) %>% 
  summarise(`Number Items` = n(), 
            `Total Cost` = sum(parse_number(Cost)))
     Year Type        `Number Items` `Total Cost`
  <dbl> <chr>                <int>        <dbl>
1  2019 Shampoo                  1           15
2  2019 Soap                     1            3
3  2020 Conditioner              1           17
4  2021 Soap.                    1            5
5  2022 Conditioner              1           19
6  2022 Shampoo                  2           25
7  2022 Soap                     2            7
  •  Tags:  
  • r
  • Related