I have a data set that has daily means of weather variables and a second set that has hourly counts of animal observations. I want to combine these datasets so that I can build a model evaluating mixed effect of weather variables on counts. To make things just a bit more complicated, the data formats are different with observations having a separate column for month, day, and hour and the weather starting with a single Dd/M/-yyyy HH:mm column. I could hand fix this in the raw data for the observations if necessary.
How do I do this best in R? Sample data:
date <- c("1/1/2020 3:00", "1/2/2020 3:00","1/3/2020 3:00")
temp <- c(18, 25, 10)
press <- c(.25, .5, 1.25)
met <- data.frame(date, press, temp)
month <- c(1, 1, 1, 1, 1, 1, 1, 1, 1)
day <- c(1, 1, 1, 2, 2, 2, 3, 3,3)
hour <- c(10, 11, 12, 10, 11, 12,10, 11, 12)
obs <- c( 14, 88, 67, 198, 3, 54, 2, 80, 36)
counts <- data.frame(month, day, hour, obs)
CodePudding user response:
Edited to spread a daily observation across all hours.
Here's some code using dplyr
and lubridate
. The function mutate
creates new columns in the data frames, parse_date_time
creates an object with a data time class and floor_date
finds the day when the observation happened.
The functions group_by
and summarise
combine observations to create a mean for a given day.
The two are combined using inner_join
.
I've assumed 2020.
library(lubridate)
library(dplyr)
date <- c("1/1/2020 3:00", "1/2/2020 3:00","1/3/2020 3:00")
temp <- c(18, 25, 10)
press <- c(.25, .5, 1.25)
met <-
data.frame(date, press, temp) %>%
mutate(dmyhm = parse_date_time(date, orders='m/d/y H:M')) %>%
mutate(ymd = floor_date(dmyhm, unit = 'day')) %>%
select(ymd, press, temp)
month <- c(1, 1, 1, 1, 1, 1, 1, 1, 1)
day <- c(1, 1, 1, 2, 2, 2, 3, 3,3)
hour <- c(10, 11, 12, 10, 11, 12,10, 11, 12)
obs <- c( 14, 88, 67, 198, 3, 54, 2, 80, 36)
counts <-
data.frame(month, day, hour, obs) %>%
mutate(dmyhm = parse_date_time(paste0(month, '/', day, '/', '2020 ', hour), orders='m/d/y H')) %>%
mutate(ymd = floor_date(dmyhm, unit='day'))
# a full join causes extra rows to be added where they
# don't exist in the smaller data set
combined <-
counts %>%
full_join(met, by = 'ymd') %>%
select(month, day, hour, obs, press, temp)
combined
# month day hour obs press temp
# 1 1 1 10 14 0.25 18
# 2 1 1 11 88 0.25 18
# 3 1 1 12 67 0.25 18
# 4 1 2 10 198 0.50 25
# 5 1 2 11 3 0.50 25
# 6 1 2 12 54 0.50 25
# 7 1 3 10 2 1.25 10
# 8 1 3 11 80 1.25 10
# 9 1 3 12 36 1.25 10