Home > database >  Subtract rows and create a new row name
Subtract rows and create a new row name

Time:03-29

I would like to subtract Bay County from Florida in this data frame and create a new row with the name "Florida (-Bay County)".

Maybe group_modify and add_row (dplyr) would be a possibility?

year <- c(2005,2006,2007,2005,2006,2007,2005,2006,2007,2005,2006,2007)
county <- c("Alachua County","Alachua County","Alachua County","Baker County","Baker County","Baker County","Bay County","Bay County","Bay County","Florida","Florida","Florida")
pop <- c(3,6,8,9,8,4,5,8,10,17,22,22)
gdp <- c(3,6,8,9,8,4,5,8,10,17,22,22)
area <- c(3,6,8,9,8,4,5,8,10,17,22,22)
density<-c(3,6,8,9,8,4,5,8,10,17,22,22)
df <- data.frame(year, county,pop,gdp,area,density, stringsAsFactors = FALSE)
year county pop gdp area density
2005 Alachua County 3 3 3 3
2005 Baker County 9 9 9 9
2005 Bay County 5 5 5 5
2005 Florida 17 17 17 17
2005 Florida (-Bay County) 12 12 12 12
2006 Alachua County 6 6 6 6
2006 Baker County 8 8 8 8
2006 Bay County 8 8 8 8
2006 Florida 22 22 22 22
2006 Florida (-Bay County) 14 14 14 14
2007 Alachua County 8 8 8 8
2007 Baker County 4 4 4 4
2007 Bay County 10 10 10 10
2007 Florida 22 22 22 22
2007 Florida (-Bay County) 12 12 12 12

CodePudding user response:

You could do:

df %>% 
  filter(county != 'Florida' & county != 'Bay County') %>%
  group_by(year) %>%
  bind_rows(summarise(., county = 'Florida (-Bay County)', 
                      across(where(is.numeric), sum))) %>%
  arrange(year)
#> # A tibble: 9 x 6
#> # Groups:   year [3]
#>    year county                  pop   gdp  area density
#>   <dbl> <chr>                 <dbl> <dbl> <dbl>   <dbl>
#> 1  2005 Alachua County            3     3     3       3
#> 2  2005 Baker County              9     9     9       9
#> 3  2005 Florida (-Bay County)    12    12    12      12
#> 4  2006 Alachua County            6     6     6       6
#> 5  2006 Baker County              8     8     8       8
#> 6  2006 Florida (-Bay County)    14    14    14      14
#> 7  2007 Alachua County            8     8     8       8
#> 8  2007 Baker County              4     4     4       4
#> 9  2007 Florida (-Bay County)    12    12    12      12

CodePudding user response:

If you wanted to try something with group_modify and add_row, you could consider something like this. Here, when using add_row, use map to sum up the data within the group, but not including "Florida" or "Bay County".

library(tidyverse)

df %>%
  group_by(year) %>%
  group_modify(
    ~ .x %>%
      add_row(
        county = "Florida (-Bay County)",
        !!! map(.x %>% 
                  filter(!county %in% c("Florida", "Bay County")) %>%
                  select(-county),
                sum)
        )
  )

Output

    year county                  pop   gdp  area density
   <dbl> <chr>                 <dbl> <dbl> <dbl>   <dbl>
 1  2005 Alachua County            3     3     3       3
 2  2005 Baker County              9     9     9       9
 3  2005 Bay County                5     5     5       5
 4  2005 Florida                  17    17    17      17
 5  2005 Florida (-Bay County)    12    12    12      12
 6  2006 Alachua County            6     6     6       6
 7  2006 Baker County              8     8     8       8
 8  2006 Bay County                8     8     8       8
 9  2006 Florida                  22    22    22      22
10  2006 Florida (-Bay County)    14    14    14      14
11  2007 Alachua County            8     8     8       8
12  2007 Baker County              4     4     4       4
13  2007 Bay County               10    10    10      10
14  2007 Florida                  22    22    22      22
15  2007 Florida (-Bay County)    12    12    12      12
  • Related