I have the following df and i wish to average the values in each year to leave me with 1 value for each of the years in the df (1 value for overall and 1 for eastern here), this would leave me with a df of 3 columns and 3 rows on this example. I tried to use tapply for this but had no joy as yet
df
date overall eastern
1 1997-12-15 17.548 18.751
2 1998-01-15 16.189 17.155
3 1998-02-15 15.667 16.504
4 1998-03-15 15.509 16.208
5 1998-04-15 16.709 17.702
6 1998-05-15 18.822 19.660
7 1998-06-15 22.722 23.512
8 1998-07-15 25.372 25.912
9 1998-08-15 26.597 27.226
10 1998-09-15 25.256 26.151
11 1998-10-15 22.857 24.440
12 1998-11-15 20.242 21.867
13 1998-12-15 17.179 18.797
14 1999-01-15 16.003 17.206
15 1999-02-15 15.140 16.345
16 1999-03-15 15.522 16.689
17 1999-04-15 16.537 17.521
18 1999-05-15 19.658 20.740
19 1999-06-15 23.245 24.161
20 1999-07-15 25.313 26.053
21 1999-08-15 26.753 27.112
22 1999-09-15 26.040 26.597
23 1999-10-15 23.843 24.940
24 1999-11-15 20.940 22.375
25 1999-12-15 17.842 19.439
CodePudding user response:
We convert the 'date' to Date
class (with lubridate::ymd
or as.Date
from base R
), extract the year
as grouping variable and summarise
across
the columns to get the mean
value
library(dplyr)
library(lubridate)
df1 %>%
group_by(year = year(ymd(date))) %>%
summarise(across(overall:eastern, mean, na.rm = TRUE))
-output
# A tibble: 3 × 3
year overall eastern
<dbl> <dbl> <dbl>
1 1997 17.5 18.8
2 1998 20.3 21.3
3 1999 20.6 21.6
If we also want by seasons, create a key value dataset with month
and seasons
values, join and do a group by mean
keydat <- tibble(seasons = rep(c("Winter", "Spring", "Summer", "Fall"),
each = 3), month = c("Dec", month.abb[-length(month.abb)]))
df1 %>%
mutate(date = as.Date(date), month = format(date, '%b'),
year = format(date, '%Y')) %>%
left_join(keydat) %>%
group_by(year, seasons) %>%
summarise(across(c(overall, eastern), mean, na.rm = TRUE),
.groups = 'drop')
-output
# A tibble: 9 × 4
year seasons overall eastern
<chr> <chr> <dbl> <dbl>
1 1997 Winter 17.5 18.8
2 1998 Fall 22.8 24.2
3 1998 Spring 17.0 17.9
4 1998 Summer 24.9 25.6
5 1998 Winter 16.3 17.5
6 1999 Fall 23.6 24.6
7 1999 Spring 17.2 18.3
8 1999 Summer 25.1 25.8
9 1999 Winter 16.3 17.7
Or in base R
aggregate(.~ year, transform(df1, year = format(as.Date(date),
'%Y'))[-1], FUN = mean)
year overall eastern
1 1997 17.54800 18.75100
2 1998 20.26008 21.26117
3 1999 20.56967 21.59817
data
df1 <- structure(list(date = c("1997-12-15", "1998-01-15", "1998-02-15",
"1998-03-15", "1998-04-15", "1998-05-15", "1998-06-15", "1998-07-15",
"1998-08-15", "1998-09-15", "1998-10-15", "1998-11-15", "1998-12-15",
"1999-01-15", "1999-02-15", "1999-03-15", "1999-04-15", "1999-05-15",
"1999-06-15", "1999-07-15", "1999-08-15", "1999-09-15", "1999-10-15",
"1999-11-15", "1999-12-15"), overall = c(17.548, 16.189, 15.667,
15.509, 16.709, 18.822, 22.722, 25.372, 26.597, 25.256, 22.857,
20.242, 17.179, 16.003, 15.14, 15.522, 16.537, 19.658, 23.245,
25.313, 26.753, 26.04, 23.843, 20.94, 17.842), eastern = c(18.751,
17.155, 16.504, 16.208, 17.702, 19.66, 23.512, 25.912, 27.226,
26.151, 24.44, 21.867, 18.797, 17.206, 16.345, 16.689, 17.521,
20.74, 24.161, 26.053, 27.112, 26.597, 24.94, 22.375, 19.439)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25"))