Home > Software engineering >  Calculate means beyond a given date R
Calculate means beyond a given date R

Time:07-26

I have a large dataframe (Merged) of several thousand lines and 20 columns. Each line is a record of a site visit. There are 400 sites in the dataframe and each has multiple lines for visits on different months. . Example:

Site Date Method Result ChangeDate
2200 01/07/2021 24 0.22 01/07/2021
2201 03/07/2021 21 0.01 01/01/2021
2203 03/07/2021 21 0.02 01/01/2021
2200 01/08/2021 21 0.05 01/07/2021
2201 03/08/2021 21 0.01 01/01/2021
2203 03/08/2021 21 0.02 01/01/2021
2200 01/09/2021 21 0.05 01/07/2021

And so on for all 400 sites over two years.

I need to calculate the mean result for each site, which I can do:

OPmean <- aggregate(Merged$Result, list(Merged$Site), FUN=mean, na.rm = TRUE

But I only want to take the means after the change date. So on site 2200 I would want to take the mean on these lines only:

Site Date Method Result ChangeDate
2200 01/08/2021 21 0.05 01/07/2021
2200 01/09/2021 21 0.05 01/07/2021

Leaving the first line for 2200 off because the date occurs on (or before) the change date.

For an output I would want a new dataframe something like this:

Site Mean Result
2200 0.05
2201 0.01
2203 0.02

CodePudding user response:

Small change to your code

aggregate(
  Result~Site,
  data=df[df$Date>df$ChangeDate,],
  FUN=mean
)

  Site Result
1 2200   0.05
2 2201   0.01
3 2203   0.02

CodePudding user response:

@user2974951 gives a great base R answer. A tidyverse approach might be

library(dplyr) 
Merged %>% 
     group_by(Site) %>% 
     filter(Date > ChangeDate) %>% 
     summarize(mr = mean(Result) # or optionally, mean(Result, na.rm = TRUE)

  •  Tags:  
  • r
  • Related