Combining observations with different time scales (e.g., hourly counts with daily weather)-CodePudding

I have a data set that has daily means of weather variables and a second set that has hourly counts of animal observations. I want to combine these datasets so that I can build a model evaluating mixed effect of weather variables on counts. To make things just a bit more complicated, the data formats are different with observations having a separate column for month, day, and hour and the weather starting with a single Dd/M/-yyyy HH:mm column. I could hand fix this in the raw data for the observations if necessary.

How do I do this best in R? Sample data:

date <- c("1/1/2020 3:00", "1/2/2020 3:00","1/3/2020 3:00")
temp <- c(18, 25, 10)
press <- c(.25, .5, 1.25)
met <- data.frame(date, press, temp)

month <- c(1, 1, 1, 1, 1, 1, 1, 1, 1)
day <- c(1, 1, 1, 2, 2, 2, 3, 3,3)
hour <- c(10, 11, 12, 10, 11, 12,10, 11, 12)
obs <- c( 14, 88, 67, 198, 3, 54, 2, 80, 36)
counts <- data.frame(month, day, hour, obs)

CodePudding user response：

Edited to spread a daily observation across all hours.

Here's some code using dplyr and lubridate. The function mutate creates new columns in the data frames, parse_date_time creates an object with a data time class and floor_date finds the day when the observation happened.

The functions group_by and summarise combine observations to create a mean for a given day.

The two are combined using inner_join.

I've assumed 2020.

library(lubridate)
library(dplyr)

date <- c("1/1/2020 3:00", "1/2/2020 3:00","1/3/2020 3:00")
temp <- c(18, 25, 10)
press <- c(.25, .5, 1.25)
met <- 
    data.frame(date, press, temp) %>% 
    mutate(dmyhm = parse_date_time(date, orders='m/d/y H:M')) %>%
    mutate(ymd = floor_date(dmyhm, unit = 'day')) %>%
    select(ymd, press, temp)

month <- c(1, 1, 1, 1, 1, 1, 1, 1, 1)
day <- c(1, 1, 1, 2, 2, 2, 3, 3,3)
hour <- c(10, 11, 12, 10, 11, 12,10, 11, 12)
obs <- c( 14, 88, 67, 198, 3, 54, 2, 80, 36)
counts <- 
    data.frame(month, day, hour, obs) %>% 
    mutate(dmyhm = parse_date_time(paste0(month, '/', day, '/', '2020 ', hour), orders='m/d/y H')) %>%
    mutate(ymd = floor_date(dmyhm, unit='day')) 

# a full join causes extra rows to be added where they 
# don't exist in the smaller data set

combined <- 
    counts %>% 
    full_join(met, by = 'ymd') %>%
    select(month, day, hour, obs, press, temp)

combined
#   month day hour obs press temp
#   1     1   1   10  14  0.25   18
#   2     1   1   11  88  0.25   18
#   3     1   1   12  67  0.25   18
#   4     1   2   10 198  0.50   25
#   5     1   2   11   3  0.50   25
#   6     1   2   12  54  0.50   25
#   7     1   3   10   2  1.25   10
#   8     1   3   11  80  1.25   10
#   9     1   3   12  36  1.25   10