Home > OS >  Aggregate multiple variables with rbind
Aggregate multiple variables with rbind

Time:03-25

I would like to aggregate several variables (countys) and assign a new rowname (Florida). I have tried it with rbind. It works with one variable, but not with several.

year <- c(2005,2006,2007,2005,2006,2007,2005,2006,2007)
county <- c("Alachua County","Alachua County","Alachua County","Baker County","Baker County","Baker County","Bay County","Bay County","Bay County")
value1 <- c(3,6,8,9,8,4,5,8,10)
value2 <- c(3,6,8,9,8,4,5,8,10)
value3 <- c(3,6,8,9,8,4,5,8,10)
value4<-c(3,6,8,9,8,4,5,8,10)
df <- data.frame(year, county,value1,value2,value3,value4, stringsAsFactors = FALSE)


The result should look like this:

year county value1 value2 value3 value4
2005 Alachua County 3 3 3 3
2006 Alachua County 6 6 6 6
2007 Alachua County 8 8 8 8
2005 Baker County 9 9 9 9
2006 Baker County 8 8 8 8
2007 Baker County 4 4 4 4
2005 Bay County 5 5 5 5
2006 Bay County 8 8 8 8
2007 Bay County 10 10 10 10
2005 Florida 17 17 17 17
2006 Florida 22 22 22 22
2007 Florida 22 22 22 22

I tried this:

df<-df %>% 
  group_by(year, county)

df<-rbind(df, aggregate (value1,value2,value3,value4) ~ year, df, FUN = sum)

and get follow error:

argument 2 must be a data frame or a named atomic vector.

thanks for help!

CodePudding user response:

Here is a data.table approach, using janitor to calculate the yearly totals

# make df a data.table
setDT(df)
# split by year
L <- split(df, by = "year", keep.by = FALSE)
# calculate totals by year
L <- lapply(L, janitor::adorn_totals, name = "Florida")
# rowbind L together to a single data.table
rbindlist(L, use.names = TRUE, id = "year")
#    year         county value1 value2 value3 value4
# 1: 2005 Alachua County      3      3      3      3
# 2: 2005   Baker County      9      9      9      9
# 3: 2005     Bay County      5      5      5      5
# 4: 2005        Florida     17     17     17     17
# 5: 2006 Alachua County      6      6      6      6
# 6: 2006   Baker County      8      8      8      8
# 7: 2006     Bay County      8      8      8      8
# 8: 2006        Florida     22     22     22     22
# 9: 2007 Alachua County      8      8      8      8
#10: 2007   Baker County      4      4      4      4
#11: 2007     Bay County     10     10     10     10
#12: 2007        Florida     22     22     22     22

CodePudding user response:

A dplyr approach would be:

df %>%
  bind_rows(df %>% 
            group_by(year) %>%
            summarize(county = 'Florida', across(starts_with('value'), sum))) %>%
  arrange(year, county)
#>    year         county value1 value2 value3 value4
#> 1  2005 Alachua County      3      3      3      3
#> 2  2005   Baker County      9      9      9      9
#> 3  2005     Bay County      5      5      5      5
#> 4  2005        Florida     17     17     17     17
#> 5  2006 Alachua County      6      6      6      6
#> 6  2006   Baker County      8      8      8      8
#> 7  2006     Bay County      8      8      8      8
#> 8  2006        Florida     22     22     22     22
#> 9  2007 Alachua County      8      8      8      8
#> 10 2007   Baker County      4      4      4      4
#> 11 2007     Bay County     10     10     10     10
#> 12 2007        Florida     22     22     22     22
  • Related