Home > other >  summing based on conditions from two dataframes and dealing with dates
summing based on conditions from two dataframes and dealing with dates

Time:07-01

I have two dataframes, one with climate data for every location and date across 4 years. The other data frame has a date for each day an animal was trapped at a site. I am trying to calculate the mean of each climate variable based on a specific amount of time before the day the animal was trapped (time length depends on variable in question).

climate <- data.frame(site=c(1,1,1,1,2,2,2,2,1,1,1,1), 
                      precip=c(0.1,0.2,0.1,0.1,0.5,0.2,0.3,0.1,0.2,0.1,0.1,0.5), 
                      humid=c(1,1,3,1,2,3,3,1,1,3,1,2),
                      date=c("6/13/2020","6/12/2020","6/11/2020","6/14/2020","6/13/2020","6/12/2020","6/11/2020","6/14/2020","2/13/2019","2/14/2019","2/15/2019","2/16/2019")) 

trap <- data.frame(site=c(1,2,3,3), date=c("7/1/2020","7/1/2020","7/2/2020","7/4/2020"))

> climate
   site precip humid      date
1     1    0.1     1 6/13/2020
2     1    0.2     1 6/12/2020
3     1    0.1     3 6/11/2020
4     1    0.1     1 6/14/2020
5     2    0.5     2 6/13/2020
6     2    0.2     3 6/12/2020
7     2    0.3     3 6/11/2020
8     2    0.1     1 6/14/2020
9     1    0.2     1 2/13/2019
10    1    0.1     3 2/14/2019
11    1    0.1     1 2/15/2019
12    1    0.5     2 2/16/2019
> trap
  site     date
1    1 7/1/2020
2    2 7/1/2020
3    3 7/2/2020
4    3 7/4/2020

I want to calculate the mean humid 18-20 days before the date written in the trap dataframe. So essentially what is the mean humid between 6/11/2020 and 6/13/2020 according to the climate data.frame for animals trapped on 7/1/2020. So for site 1 that would be: 1.667 and site 2 that would be 2.67.

I also want to calculate the sum of precipitation 497-500 days before the date written in the trap dataframe. So I would need to calculate the sum (total) precip between 2/13/2019 and 2/16/2019 for an animal trapped on 7/1/2020 at each site. So for site 1 precip would be 0.9.

I know how to create new columns in the trap data frame for mean precip and sum humid but I'm not sure where to start in terms of coding so that each value is calculated as described above and the data that corresponds to the correct date is used for the large dataset that contains many different trap dates.

Thank you very much, hopefully I am being clear in my description.

CodePudding user response:

I have a solution using functions from the tidyverse. It is always useful to convert date variables to the class date. With this class, you can make calculations. Note, that I renamed the date column in the trap data to trap_date. See comments for more details:

library(tidyverse)

climate <- data.frame(site=c(1,1,1,1,2,2,2,2,1,1,1,1), 
                      precip=c(0.1,0.2,0.1,0.1,0.5,0.2,0.3,0.1,0.2,0.1,0.1,0.5), 
                      humid=c(1,1,3,1,2,3,3,1,1,3,1,2),
                      date=c("6/13/2020","6/12/2020","6/11/2020","6/14/2020","6/13/2020","6/12/2020","6/11/2020","6/14/2020","2/13/2019","2/14/2019","2/15/2019","2/16/2019")) 

trap <- data.frame(site=c(1,2,3,3), trap_date=c("7/1/2020","7/1/2020","7/2/2020","7/4/2020"))

# merge data
data <- merge(climate, trap, by="site")

> head(data)
site precip humid       date  trap_date
1    1    0.1     1 2020-06-13 2020-07-01
2    1    0.2     1 2020-06-12 2020-07-01
3    1    0.1     3 2020-06-11 2020-07-01
4    1    0.1     1 2020-06-14 2020-07-01
5    1    0.2     1 2019-02-13 2020-07-01
6    1    0.1     3 2019-02-14 2020-07-01

# parse dates to class 'date'; enables calculations
data <- data %>%
  mutate(date = parse_date(date, format="%m/%d/%Y"),
         trap_date = parse_date(trap_date, format="%m/%d/%Y"))

For means:

# humid means
data %>% 
  group_by(site) %>%
  filter(date >= trap_date-20 & date <= trap_date-18) %>%
  summarise(mean = mean(humid))

# A tibble: 2 x 2
     site  mean
    <dbl> <dbl>
  1     1  1.67
  2     2  2.67

However, it seems that the range of 497 to 500 days before the trap date contains no observations. When I used your specified dates, I got the same result of 0.9:

# precip sums
data %>% 
  group_by(site) %>%
  filter(date >= trap_date-500 & date <= trap_date-497)

# A tibble: 0 x 5
# Groups:   site [0]
# ... with 5 variables: site <dbl>, precip <dbl>, humid <dbl>,
#   date <date>, trap_date <date>


# using your provided dates
data %>% 
  group_by(site) %>%
  filter(date >= as.Date("2019-02-13") & date <= as.Date("2019-02-16")) %>%
  summarise(sum = sum(precip))

# A tibble: 1 x 2
     site   sum
    <dbl> <dbl>
  1     1   0.9

Hope I can help.

  • Related