Home > Back-end >  Aggregating mean beyond given date R
Aggregating mean beyond given date R

Time:07-25

I have a large dataframe (Merged) of several thousand lines and 20 columns. Each line is a record of a site visit. There are 400 sites in the dataframe and each has multiple lines for visits on different months. Each one also has an assessment method which might be changed in the year. Example:

Site Date Method Result
2200 21/06/2021 24 0.22
2201 03/07/2021 21 0.01
2203 03/07/2021 21 0.02
2200 01/08/2021 21 0.05
2201 03/08/2021 21 0.01
2203 03/08/2021 21 0.02
2200 01/09/2021 21 0.05

And so on for all 400 sites over two years. I have been pulling out some summary stats from this including a mean score for each site:

OPmean <- aggregate(Merged$Result, list(Merged$Site), FUN=mean, na.rm = TRUE

This works fine however I have decided to take the mean for site only from the beginning of their latest Method code.

In the table above for site 2200 we can see that the method changed in August 2021 and stayed that way for the rest of the period. I want to get the mean value for any data points for site 2200 after this change happened. I want to do this for every line, method codes can change at any time but it will happen maybe once per year. I'm not interested in any data points before the method code changed.

In summary, I want my code to:

  • identify the most recent method code
  • identify the first time the most recent method code was used
  • take a mean value of every data point from that change onwards.

I wrote some rather janky code which identifies IF the method has changed:

method <- aggregate(Merged$Method, list(Merged$Site), FUN=mean, na.rm = TRUE)
  
ch<-  method %>% 
  mutate(
    change = case_when(
      x == 21.00000 ~ "No",
      x == 24.00000 ~ "No",
      x != 24.00000 |21.00000  ~ "Yes"
    )
  )

But I am struggling to adapt that to say when the code changed and then don't have any idea how to use that to aggregate the sites and calculate the mean after.

Help appreciated!

Here is some data:

Site <- c("2200", "2201", "2203", "2200", "2201", "2203", "2200"  )
date <- c("21/06/2021", "03/07/202", "03/07/2021", "01/08/2021", "03/08/2021", "03/08/2021", "01/09/2021" )
Method <- c("24", "21", "21", "21", "21", "21", "21")
Result <- c(0.22, 0.01, 0.02, 0.05, 0.01, 0.02, 0.02)


Date<- as.Date(date, "%d/%m/%Y")


Merged <- data.frame(Site, Date, Method, Result)

CodePudding user response:

I am not sure what the desired output is. You can filter out any Method that is not equal to the last observed one and summarise their mean.

library(tidyverse)

Merged %>% 
  group_by(Site) %>% 
  filter(Method == last(Method)) %>%  
  summarise(mean_method = mean(Method))

# A tibble: 3 × 2
   Site mean_method
  <dbl>       <dbl>
1  2200          21
2  2201          21
3  2203          21
  •  Tags:  
  • r
  • Related