Home > database >  R: aggregate rows to transform monthly into quarterly data
R: aggregate rows to transform monthly into quarterly data

Time:03-17

I have a dataframe with monthly observations. Each row represents a month. I need to transform it into a dataframe with quarterly observations. In other words, I need to aggregate months 1-3 to form Q1, 4-6 for Q2, etc.

survey1 <- data.frame("col1" = c(10, 10, 10, 20, 20, 20),
                     "col2" = c(10, 15, 12, 30, 33, 35),
                     "col3" = c(12, 12, 12, 22, 22, 22))

It gets trickier since I want to aggregate columns with different operations.

Column1 and column 3 are constant during each quarter: they have the same values for month 1, month 2, and month 3.

On the other hand, column 2 does record different values for each month. I would like to sum up all the values of column 2 for each quarterly observation of the new dataframe.

In other words, I want to get from survey1 above to survey2 below.

survey2 <- data.frame("col1" = c(10, 20),
                     "col2" = c(37, 98),
                     "col3" = c(12, 22))

Thanks

CodePudding user response:

You may divide every 3 rows into one quarter and calculate the summarising statistics for each variable. Since col1 and col3 have the same value in each quarter we can select any value from those columns, I have selected the 1st one.

library(dplyr)

survey1 %>%
  group_by(Quarter = paste0('Q', ceiling(row_number()/3))) %>%
  summarise(across(c(col1, col3), first), 
            col2 = sum(col2)) %>%
  select(Quarter, col1, col2, col3)

#  Quarter  col1  col2  col3
#  <chr>   <dbl> <dbl> <dbl>
#1 Q1         10    37    12
#2 Q2         20    98    22

A hack can also be to use col1 and col3 as grouping variable since we know they are the same every quarter. This would work provided every quarter has a different value for col1 and col3 combination.

survey1 %>% 
    group_by(col1, col3) %>% 
    summarise(col2 = sum(col2), .groups = 'drop')

CodePudding user response:

I provided alternative solution

  1. create a mock month's column (dynamically expands by row of data)
  2. then create quarter column
  3. group_by quarter
  4. For cols1&3 take the minimum values since it doesn't change
  5. for cols2 take the sum
survey1 %>% 
  mutate(months=1:dim(survey1)[1],
         quarter=lubridate::quarter(months)) %>% 
  group_by(quarter) %>% 
  summarize(col1_min=min(col1),
            col2_sum=sum(col2),
            col3_min=min(col3))
  • Related