I have a large dataframe of this structure:
structure(list(Date = structure(c(18220, 18220, 18220,
18955, 19110, 19110, 18514, 18514, 18892, 18647, 18647, 18528,
18822, 18822, 18822, 18822, 18745, 18745, 18745), class = "Date"),
Type = c("Main", "Main", "Main",
"Main", "Secondary", "Tri-Annual", "Main",
"Main", "Tri-Annual", "Main", "Syndication",
"Main", "Tri-Annual", "Syndication", "Secondary",
"Main", "Main", "Tri-Annual", "Secondary"
), Value = c(4000, 2250, 2250, 1100, 1800, 12000,
8000, 9000, 10000, 6500, 7000, 6500, 7000, 4250, 5500, 2500,
6000, 6000, 4500), buckets = c("Long", "Long", "Long",
"Long", "Long", "Medium", "Medium", "Long", "Medium", "Long",
"Long", "Long", "Long", "Long", "Long", "Long", "Long", "Long",
"Long")), row.names = c(NA, -19L), class = c("tbl_df", "tbl",
"data.frame"))
I'm trying to aggregate the data by Type and Date using:
df <- df %>% group_by(Date, Type) %>% dplyr::summarise(Aggregated = sum(df$Value))
But instead of giving the sum of each Type's Value on each day, it sums all the Values together (regardless of day) and posts it in the Aggregated column. Does anyone know what's wrong with this?
CodePudding user response:
Check the use of your variable call.
If you use df$Value as part of your summarise()
call, the sum will be built on the vector of that object. This means you "leave" the pipe and look for an object (happens to be the same, i.e. df) and take its column Value), instead of summarising the grouped tibble/dataframe.
df %>%
group_by(Date, Type) %>%
summarise(N = n() # just to count we have several entries per Date and Type
, aggregated_old = sum(df$Value) # you call the dataframe column
, aggregated_new = sum(Value) # do not use object df, just column name
)
# A tibble: 16 × 5
# Groups: Date [9]
Date Type N aggregated_old aggregated_new
<date> <chr> <int> <dbl> <dbl>
1 2019-11-20 Main 3 106150 8500
2 2020-09-09 Main 2 106150 17000
3 2020-09-23 Main 1 106150 6500
4 2021-01-20 Main 1 106150 6500
5 2021-01-20 Syndication 1 106150 7000
6 2021-04-28 Main 1 106150 6000
7 2021-04-28 Secondary 1 106150 4500
8 2021-04-28 Tri-Annual 1 106150 6000
9 2021-07-14 Main 1 106150 2500
10 2021-07-14 Secondary 1 106150 5500
11 2021-07-14 Syndication 1 106150 4250
12 2021-07-14 Tri-Annual 1 106150 7000
13 2021-09-22 Tri-Annual 1 106150 10000
14 2021-11-24 Main 1 106150 1100
15 2022-04-28 Secondary 1 106150 1800
16 2022-04-28 Tri-Annual 1 106150 12000