I would like to aggregate several variables (countys) and assign a new rowname (Florida). I have tried it with rbind. It works with one variable, but not with several.
year <- c(2005,2006,2007,2005,2006,2007,2005,2006,2007)
county <- c("Alachua County","Alachua County","Alachua County","Baker County","Baker County","Baker County","Bay County","Bay County","Bay County")
value1 <- c(3,6,8,9,8,4,5,8,10)
value2 <- c(3,6,8,9,8,4,5,8,10)
value3 <- c(3,6,8,9,8,4,5,8,10)
value4<-c(3,6,8,9,8,4,5,8,10)
df <- data.frame(year, county,value1,value2,value3,value4, stringsAsFactors = FALSE)
The result should look like this:
year | county | value1 | value2 | value3 | value4 |
---|---|---|---|---|---|
2005 | Alachua County | 3 | 3 | 3 | 3 |
2006 | Alachua County | 6 | 6 | 6 | 6 |
2007 | Alachua County | 8 | 8 | 8 | 8 |
2005 | Baker County | 9 | 9 | 9 | 9 |
2006 | Baker County | 8 | 8 | 8 | 8 |
2007 | Baker County | 4 | 4 | 4 | 4 |
2005 | Bay County | 5 | 5 | 5 | 5 |
2006 | Bay County | 8 | 8 | 8 | 8 |
2007 | Bay County | 10 | 10 | 10 | 10 |
2005 | Florida | 17 | 17 | 17 | 17 |
2006 | Florida | 22 | 22 | 22 | 22 |
2007 | Florida | 22 | 22 | 22 | 22 |
I tried this:
df<-df %>%
group_by(year, county)
df<-rbind(df, aggregate (value1,value2,value3,value4) ~ year, df, FUN = sum)
and get follow error:
argument 2 must be a data frame or a named atomic vector.
thanks for help!
CodePudding user response:
Here is a data.table
approach, using janitor
to calculate the yearly totals
# make df a data.table
setDT(df)
# split by year
L <- split(df, by = "year", keep.by = FALSE)
# calculate totals by year
L <- lapply(L, janitor::adorn_totals, name = "Florida")
# rowbind L together to a single data.table
rbindlist(L, use.names = TRUE, id = "year")
# year county value1 value2 value3 value4
# 1: 2005 Alachua County 3 3 3 3
# 2: 2005 Baker County 9 9 9 9
# 3: 2005 Bay County 5 5 5 5
# 4: 2005 Florida 17 17 17 17
# 5: 2006 Alachua County 6 6 6 6
# 6: 2006 Baker County 8 8 8 8
# 7: 2006 Bay County 8 8 8 8
# 8: 2006 Florida 22 22 22 22
# 9: 2007 Alachua County 8 8 8 8
#10: 2007 Baker County 4 4 4 4
#11: 2007 Bay County 10 10 10 10
#12: 2007 Florida 22 22 22 22
CodePudding user response:
A dplyr
approach would be:
df %>%
bind_rows(df %>%
group_by(year) %>%
summarize(county = 'Florida', across(starts_with('value'), sum))) %>%
arrange(year, county)
#> year county value1 value2 value3 value4
#> 1 2005 Alachua County 3 3 3 3
#> 2 2005 Baker County 9 9 9 9
#> 3 2005 Bay County 5 5 5 5
#> 4 2005 Florida 17 17 17 17
#> 5 2006 Alachua County 6 6 6 6
#> 6 2006 Baker County 8 8 8 8
#> 7 2006 Bay County 8 8 8 8
#> 8 2006 Florida 22 22 22 22
#> 9 2007 Alachua County 8 8 8 8
#> 10 2007 Baker County 4 4 4 4
#> 11 2007 Bay County 10 10 10 10
#> 12 2007 Florida 22 22 22 22