Home > Net >  calculate the sum in a data.frame (long format)
calculate the sum in a data.frame (long format)

Time:12-09

I want to calculate the sum for this data.frame for the years 2005 ,2006, 2007 and the categories a, b, c.

year <- c(2005,2005,2005,2006,2006,2006,2007,2007,2007)
category <- c("a","a","a","b","b","b","c","c","c")
value <- c(3,6,8,9,7,4,5,8,9)
df <- data.frame(year, category,value, stringsAsFactors = FALSE)

The table should look like this:

year category value
2005 a 1
2005 a 1
2005 a 1
2006 b 2
2006 b 2
2006 b 2
2007 c 3
2007 c 3
2007 c 3
2006 a 3
2007 b 6
2008 c 9

Any idea how this could be implemented? add_row or cbind maybe?

CodePudding user response:

How about like this using the dplyr package:

df %>% 
  group_by(year, category) %>% 
  summarise(sum = sum(value))
# # A tibble: 3 × 3
# # Groups:   year [3]
#    year category   sum
#   <dbl> <chr>    <dbl>
# 1  2005 a           17
# 2  2006 b           20
# 3  2007 c           22

If you would rather add a column that is the sum than collapse it, replace summarise() with mutate()

df %>% 
  group_by(year, category) %>% 
  mutate(sum = sum(value))
# # A tibble: 9 × 4
# # Groups:   year, category [3]
#    year category value   sum
#   <dbl> <chr>    <dbl> <dbl>
# 1  2005 a            3    17
# 2  2005 a            6    17
# 3  2005 a            8    17
# 4  2006 b            9    20
# 5  2006 b            7    20
# 6  2006 b            4    20
# 7  2007 c            5    22
# 8  2007 c            8    22
# 9  2007 c            9    22

CodePudding user response:

A base R solution using aggregate

rbind( df, aggregate( value ~ year   category, df, sum ) )

   year category value
1  2005        a     3
2  2005        a     6
3  2005        a     8
4  2006        b     9
5  2006        b     7
6  2006        b     4
7  2007        c     5
8  2007        c     8
9  2007        c     9
10 2005        a    17
11 2006        b    20
12 2007        c    22
  • Related