Aggregate multiple variables with rbind-CodePudding

I would like to aggregate several variables (countys) and assign a new rowname (Florida). I have tried it with rbind. It works with one variable, but not with several.

year <- c(2005,2006,2007,2005,2006,2007,2005,2006,2007)
county <- c("Alachua County","Alachua County","Alachua County","Baker County","Baker County","Baker County","Bay County","Bay County","Bay County")
value1 <- c(3,6,8,9,8,4,5,8,10)
value2 <- c(3,6,8,9,8,4,5,8,10)
value3 <- c(3,6,8,9,8,4,5,8,10)
value4<-c(3,6,8,9,8,4,5,8,10)
df <- data.frame(year, county,value1,value2,value3,value4, stringsAsFactors = FALSE)

The result should look like this:

year	county	value1	value2	value3	value4
2005	Alachua County	3	3	3	3
2006	Alachua County	6	6	6	6
2007	Alachua County	8	8	8	8
2005	Baker County	9	9	9	9
2006	Baker County	8	8	8	8
2007	Baker County	4	4	4	4
2005	Bay County	5	5	5	5
2006	Bay County	8	8	8	8
2007	Bay County	10	10	10	10
2005	Florida	17	17	17	17
2006	Florida	22	22	22	22
2007	Florida	22	22	22	22

I tried this:

df<-df %>% 
  group_by(year, county)

df<-rbind(df, aggregate (value1,value2,value3,value4) ~ year, df, FUN = sum)

and get follow error:

argument 2 must be a data frame or a named atomic vector.

thanks for help!

CodePudding user response：

Here is a data.table approach, using janitor to calculate the yearly totals

# make df a data.table
setDT(df)
# split by year
L <- split(df, by = "year", keep.by = FALSE)
# calculate totals by year
L <- lapply(L, janitor::adorn_totals, name = "Florida")
# rowbind L together to a single data.table
rbindlist(L, use.names = TRUE, id = "year")
#    year         county value1 value2 value3 value4
# 1: 2005 Alachua County      3      3      3      3
# 2: 2005   Baker County      9      9      9      9
# 3: 2005     Bay County      5      5      5      5
# 4: 2005        Florida     17     17     17     17
# 5: 2006 Alachua County      6      6      6      6
# 6: 2006   Baker County      8      8      8      8
# 7: 2006     Bay County      8      8      8      8
# 8: 2006        Florida     22     22     22     22
# 9: 2007 Alachua County      8      8      8      8
#10: 2007   Baker County      4      4      4      4
#11: 2007     Bay County     10     10     10     10
#12: 2007        Florida     22     22     22     22

CodePudding user response：

A dplyr approach would be:

df %>%
  bind_rows(df %>% 
            group_by(year) %>%
            summarize(county = 'Florida', across(starts_with('value'), sum))) %>%
  arrange(year, county)
#>    year         county value1 value2 value3 value4
#> 1  2005 Alachua County      3      3      3      3
#> 2  2005   Baker County      9      9      9      9
#> 3  2005     Bay County      5      5      5      5
#> 4  2005        Florida     17     17     17     17
#> 5  2006 Alachua County      6      6      6      6
#> 6  2006   Baker County      8      8      8      8
#> 7  2006     Bay County      8      8      8      8
#> 8  2006        Florida     22     22     22     22
#> 9  2007 Alachua County      8      8      8      8
#> 10 2007   Baker County      4      4      4      4
#> 11 2007     Bay County     10     10     10     10
#> 12 2007        Florida     22     22     22     22