I have a data frame like below,
df<- structure(list(TIME = c("25/01/2011", "25/01/2011", "25/01/2011",
"25/01/2011", "16/11/2011", "26/01/2011", "16/11/2011", "16/11/2011",
"25/01/2011", "25/01/2011", "16/11/2011", "16/11/2011", "27/09/2011",
"16/11/2011", "16/11/2011", "07/07/2012", "16/11/2011", "21/09/2012",
"16/11/2011", "26/01/2011"), Series = c(1L, 1L, 1L, 1L, 9L, 1L,
9L, 9L, 1L, 1L, 9L, 9L, 8L, 9L, 9L, 14L, 9L, 16L, 9L, 1L), block = c(2L,
2L, 2L, 1L, 2L, 3L, 2L, 2L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 3L, 1L,
2L, 1L, 3L), RE = c(1.28, 1.52, 2.9, 1.15, 1.72, 0.22, 2.45,
2.32, 1.2, 1.13, 1.9, 0.78, 5.06, 2.03, 1.7, 5.62, 1.93, 4.21,
2.16, 0.59)), row.names = c(NA, 20L), class = "data.frame")
In my following procedure, I need to calculate the average of the data by TIME
and block
. However, as you can see from df
, each series sometimes corresponds to 2 or 3 different dates. I want to find all these differences and change the date to the first date under the same series. I have no idea how to realize this, by loop function maybe?
df<- tapply(mol.df$RE,list(mol.df$TIME,mol.df$block),mean) #here you will get a list
Hope someone could help.Thanks!
CodePudding user response:
Grouped by 'Series', convert the 'TIME' to Date
class, get the min
value and add as group before getting the mean
of 'RE'
library(dplyr)
library(lubridate)
df %>%
group_by(Series) %>%
mutate(TIME = min(dmy(TIME))) %>%
group_by(TIME, .add = TRUE) %>%
summarise(RE = mean(RE), .groups = 'drop')