Group and add variable of type stock and another type in a single step?-CodePudding

I want to group by district summing 'incoming' values at quarter and get the value of the 'stock' in the last quarter (3) in just one step. 'stock' can not summed through quarters. My example dataframe:

library(dplyr)
df <- data.frame ("district"= rep(c("ARA", "BJI", "CMC"), each=3),
                  "quarter"=rep(1:3,3),
                  "incoming"= c(4044, 2992, 2556, 1639, 9547, 1191,2038,1942,225),
                  "stock"= c(19547,3160, 1533,5355,6146,355,5816,1119,333)
                  )
df

  district quarter incoming stock
1      ARA       1     4044 19547
2      ARA       2     2992  3160
3      ARA       3     2556  1533
4      BJI       1     1639  5355
5      BJI       2     9547  6146
6      BJI       3     1191   355
7      CMC       1     2038  5816
8      CMC       2     1942  1119
9      CMC       3      225   333

The actual dataframe has ~45.000 rows and 41 variables of which 8 are of type stock.

The result should be:

# A tibble: 3 × 3
  district stock incoming
  <chr>    <dbl>    <dbl>
1 ARA       1533     9592
2 BJI        355    12377
3 CMC        333     4205

I know how to get to the result but in three steps and I don't think it's efficient and error prone due to the data.

My approach:


basea <- df %>% 
  group_by(district) %>% 
  filter(quarter==3) %>% #take only the last quarter
  summarise(across(stock, sum)) %>% 
 
baseb <- df %>% 
  group_by(district) %>% 
  summarise(across(incoming, sum)) %>% 

final <- full_join(basea, baseb)

Does anyone have any suggestions to perform the procedure in one (or at least two) steps? Grateful, Modus

CodePudding user response：

Given that the dataset only has 3 quarters and not 4. If that's not the case use nth(3) instead of last()

library(tidyverse)

df %>% 
  group_by(district) %>%  
  summarise(stock = last(stock), 
            incoming = sum(incoming))

# A tibble: 3 × 3
  district stock incoming
  <chr>    <dbl>    <dbl>
1 ARA       1533     9592
2 BJI        355    12377
3 CMC        333     4205

CodePudding user response：

here is a data.table approach

library(data.table)
setDT(df)[, .(incoming = sum(incoming), stock = stock[.N]), by = .(district)]

   district incoming stock
1:      ARA     9592  1533
2:      BJI    12377   355
3:      CMC     4205   333

CodePudding user response：

Here's a refactor that removes some of the duplicated code. This also seems like a prime use-case for creating a custom function that can be QC'd and maintained easier:

library(dplyr)

df <- data.frame ("district"= rep(c("ARA", "BJI", "CMC"), each=3),
                  "quarter"=rep(1:3,3),
                  "incoming"= c(4044, 2992, 2556, 1639, 9547, 1191,2038,1942,225),
                  "stock"= c(19547,3160, 1533,5355,6146,355,5816,1119,333)
)

aggregate_stocks <- function(df, n_quarter) {
  
  base <- df %>% 
    group_by(district)
  
  basea <- base %>% 
    filter(quarter == n_quarter) %>% 
    summarise(across(stock, sum)) 
  
  baseb <- base %>% 
    summarise(across(incoming, sum)) 
  
  final <- full_join(basea, baseb, by = "district")
  
  return(final)
  
}

aggregate_stocks(df, 3)
#> # A tibble: 3 × 3
#>   district stock incoming
#>   <chr>    <dbl>    <dbl>
#> 1 ARA       1533     9592
#> 2 BJI        355    12377
#> 3 CMC        333     4205