I am trying to create a rolling average calculation in R, but I only have data for certain dates within a range. However, instead of having dates where there is no data be totally omitted from the rolling average calculation, I would like the dates where there is no data to be counted as 0 in the rolling average calculation.
I have created an example data frame to demonstrate, but in my actual data set I obviously have much more dates and names. Additionally, I am trying to figure out a way to make the names be added where they are added for each day that is 0.
Here is the code for an example DF
Date <- Date <- as.Date(c('2021-01-01','2021-01-01','2021-01-01', '2021-01-03','2021-01-03','2021-01-03', '2021-01-04','2021-01-04','2021-01-04', '2021-02-02','2021-02-02','2021-02-02', '2021-02-03','2021-02-03','2021-02-03', '2021-03-01','2021-03-01','2021-03-01'))
Name <- c("John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark")
Hours <- c(floor(runif(18, min=0, max = 14)))
data.frame(Date, Name, Hours)
With the above dataframe looking like this
Date Name Hours
1 2021-01-01 John Smith 11
2 2021-01-01 Jane Peters 9
3 2021-01-01 Jim Clark 6
4 2021-01-03 John Smith 7
5 2021-01-03 Jane Peters 1
6 2021-01-03 Jim Clark 9
7 2021-01-04 John Smith 2
8 2021-01-04 Jane Peters 4
9 2021-01-04 Jim Clark 10
10 2021-02-02 John Smith 2
11 2021-02-02 Jane Peters 1
12 2021-02-02 Jim Clark 3
13 2021-02-03 John Smith 8
14 2021-02-03 Jane Peters 7
15 2021-02-03 Jim Clark 0
16 2021-03-01 John Smith 11
17 2021-03-01 Jane Peters 6
18 2021-03-01 Jim Clark 8
To illustrate, I would like to add a day where the "Hours" is 0 for each day for each person where there is no data.
The end result would look something like this for the first few days in January, but extending until the end date at the beginning of March.
1 2021-01-01 John Smith 11
2 2021-01-01 Jane Peters 9
3 2021-01-01 Jim Clark 6
4 2021-01-02 John Smith 0
5 2021-01-02 Jane Peters 0
6 2021-01-02 Jim Clark 0
4 2021-01-03 John Smith 7
5 2021-01-03 Jane Peters 1
6 2021-01-03 Jim Clark 9
7 2021-01-04 John Smith 2
8 2021-01-04 Jane Peters 4
9 2021-01-04 Jim Clark 10
10 2021-01-05 John Smith 0
11 2021-01-05 Jane Peters 0
12 2021-01-05 Jim Clark 0
This way, I will be able to add the days where there is no data as 0's into my rolling average calculation.
CodePudding user response:
It sounds like you are looking for the tidyr::complete
function. That would look like this
library(tidyr)
complete(mydata, Date=seq(min(Date), max(Date), by="1 day"), Name, fill=list(Hours=0))
CodePudding user response:
Just with base R you could merge
your data to an expand.grid
.
res <- merge(dat,
expand.grid(
Date=seq(as.Date('2021-01-01'), as.Date('2021-03-01'), 'days'),
Name=unique(dat$Name)),
all=TRUE)
res <- transform(res, Hours=replace(Hours, is.na(Hours), 0))
head(res)
# Date Name Hours
# 1 2021-01-01 Jane Peters 9
# 2 2021-01-01 Jim Clark 6
# 3 2021-01-01 John Smith 11
# 4 2021-01-02 Jane Peters 0
# 5 2021-01-02 Jim Clark 0
# 6 2021-01-02 John Smith 0
Data
dat <- structure(list(Date = structure(c(18628, 18628, 18628, 18630,
18630, 18630, 18631, 18631, 18631, 18660, 18660, 18660, 18661,
18661, 18661, 18687, 18687, 18687), class = "Date"), Name = c("John Smith",
"Jane Peters", "Jim Clark", "John Smith", "Jane Peters", "Jim Clark",
"John Smith", "Jane Peters", "Jim Clark", "John Smith", "Jane Peters",
"Jim Clark", "John Smith", "Jane Peters", "Jim Clark", "John Smith",
"Jane Peters", "Jim Clark"), Hours = c(11L, 9L, 6L, 7L, 1L, 9L,
2L, 4L, 10L, 2L, 1L, 3L, 8L, 7L, 0L, 11L, 6L, 8L)), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18"), class = "data.frame")