Home > Back-end >  Averaging df columns/rows by year
Averaging df columns/rows by year

Time:10-25

I have the following df and i wish to average the values in each year to leave me with 1 value for each of the years in the df (1 value for overall and 1 for eastern here), this would leave me with a df of 3 columns and 3 rows on this example. I tried to use tapply for this but had no joy as yet

df 
    date       overall  eastern
1   1997-12-15  17.548  18.751
2   1998-01-15  16.189  17.155
3   1998-02-15  15.667  16.504
4   1998-03-15  15.509  16.208
5   1998-04-15  16.709  17.702
6   1998-05-15  18.822  19.660
7   1998-06-15  22.722  23.512
8   1998-07-15  25.372  25.912
9   1998-08-15  26.597  27.226
10  1998-09-15  25.256  26.151
11  1998-10-15  22.857  24.440
12  1998-11-15  20.242  21.867
13  1998-12-15  17.179  18.797
14  1999-01-15  16.003  17.206
15  1999-02-15  15.140  16.345
16  1999-03-15  15.522  16.689
17  1999-04-15  16.537  17.521
18  1999-05-15  19.658  20.740
19  1999-06-15  23.245  24.161
20  1999-07-15  25.313  26.053
21  1999-08-15  26.753  27.112
22  1999-09-15  26.040  26.597
23  1999-10-15  23.843  24.940
24  1999-11-15  20.940  22.375
25  1999-12-15  17.842  19.439

CodePudding user response:

We convert the 'date' to Date class (with lubridate::ymd or as.Date from base R), extract the year as grouping variable and summarise across the columns to get the mean value

library(dplyr)
library(lubridate)
df1 %>%
    group_by(year = year(ymd(date))) %>%
    summarise(across(overall:eastern, mean, na.rm = TRUE))

-output

# A tibble: 3 × 3
   year overall eastern
  <dbl>   <dbl>   <dbl>
1  1997    17.5    18.8
2  1998    20.3    21.3
3  1999    20.6    21.6

If we also want by seasons, create a key value dataset with month and seasons values, join and do a group by mean

keydat <- tibble(seasons = rep(c("Winter", "Spring", "Summer", "Fall"),
    each = 3), month = c("Dec", month.abb[-length(month.abb)]))
df1 %>%
   mutate(date = as.Date(date), month = format(date, '%b'),
   year = format(date, '%Y')) %>% 
  left_join(keydat) %>% 
  group_by(year, seasons) %>% 
  summarise(across(c(overall, eastern), mean, na.rm = TRUE), 
    .groups = 'drop')

-output

# A tibble: 9 × 4
  year  seasons overall eastern
  <chr> <chr>     <dbl>   <dbl>
1 1997  Winter     17.5    18.8
2 1998  Fall       22.8    24.2
3 1998  Spring     17.0    17.9
4 1998  Summer     24.9    25.6
5 1998  Winter     16.3    17.5
6 1999  Fall       23.6    24.6
7 1999  Spring     17.2    18.3
8 1999  Summer     25.1    25.8
9 1999  Winter     16.3    17.7

Or in base R

aggregate(.~ year, transform(df1, year = format(as.Date(date), 
   '%Y'))[-1], FUN = mean)
  year  overall  eastern
1 1997 17.54800 18.75100
2 1998 20.26008 21.26117
3 1999 20.56967 21.59817

data

df1 <- structure(list(date = c("1997-12-15", "1998-01-15", "1998-02-15", 
"1998-03-15", "1998-04-15", "1998-05-15", "1998-06-15", "1998-07-15", 
"1998-08-15", "1998-09-15", "1998-10-15", "1998-11-15", "1998-12-15", 
"1999-01-15", "1999-02-15", "1999-03-15", "1999-04-15", "1999-05-15", 
"1999-06-15", "1999-07-15", "1999-08-15", "1999-09-15", "1999-10-15", 
"1999-11-15", "1999-12-15"), overall = c(17.548, 16.189, 15.667, 
15.509, 16.709, 18.822, 22.722, 25.372, 26.597, 25.256, 22.857, 
20.242, 17.179, 16.003, 15.14, 15.522, 16.537, 19.658, 23.245, 
25.313, 26.753, 26.04, 23.843, 20.94, 17.842), eastern = c(18.751, 
17.155, 16.504, 16.208, 17.702, 19.66, 23.512, 25.912, 27.226, 
26.151, 24.44, 21.867, 18.797, 17.206, 16.345, 16.689, 17.521, 
20.74, 24.161, 26.053, 27.112, 26.597, 24.94, 22.375, 19.439)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25"))
  •  Tags:  
  • r
  • Related