Home > Blockchain >  How to find different dates under one series and choose the first date only of each series in r?
How to find different dates under one series and choose the first date only of each series in r?

Time:11-05

I have a data frame like below,

df<- structure(list(TIME = c("25/01/2011", "25/01/2011", "25/01/2011", 
"25/01/2011", "16/11/2011", "26/01/2011", "16/11/2011", "16/11/2011", 
"25/01/2011", "25/01/2011", "16/11/2011", "16/11/2011", "27/09/2011", 
"16/11/2011", "16/11/2011", "07/07/2012", "16/11/2011", "21/09/2012", 
"16/11/2011", "26/01/2011"), Series = c(1L, 1L, 1L, 1L, 9L, 1L, 
9L, 9L, 1L, 1L, 9L, 9L, 8L, 9L, 9L, 14L, 9L, 16L, 9L, 1L), block = c(2L, 
2L, 2L, 1L, 2L, 3L, 2L, 2L, 1L, 1L, 1L, 3L, 1L, 3L, 3L, 3L, 1L, 
2L, 1L, 3L), RE = c(1.28, 1.52, 2.9, 1.15, 1.72, 0.22, 2.45, 
2.32, 1.2, 1.13, 1.9, 0.78, 5.06, 2.03, 1.7, 5.62, 1.93, 4.21, 
2.16, 0.59)), row.names = c(NA, 20L), class = "data.frame")

In my following procedure, I need to calculate the average of the data by TIME and block. However, as you can see from df, each series sometimes corresponds to 2 or 3 different dates. I want to find all these differences and change the date to the first date under the same series. I have no idea how to realize this, by loop function maybe?

df<- tapply(mol.df$RE,list(mol.df$TIME,mol.df$block),mean) #here you will get a list

Hope someone could help.Thanks!

CodePudding user response:

Grouped by 'Series', convert the 'TIME' to Date class, get the min value and add as group before getting the mean of 'RE'

library(dplyr)
library(lubridate)
df %>% 
    group_by(Series) %>%
     mutate(TIME = min(dmy(TIME))) %>% 
    group_by(TIME, .add = TRUE) %>%
    summarise(RE = mean(RE), .groups = 'drop')
  •  Tags:  
  • r
  • Related