Home > Back-end >  Adding Dates and Names Together in R DataFrame
Adding Dates and Names Together in R DataFrame

Time:11-03

I am trying to create a rolling average calculation in R, but I only have data for certain dates within a range. However, instead of having dates where there is no data be totally omitted from the rolling average calculation, I would like the dates where there is no data to be counted as 0 in the rolling average calculation.

I have created an example data frame to demonstrate, but in my actual data set I obviously have much more dates and names. Additionally, I am trying to figure out a way to make the names be added where they are added for each day that is 0.

Here is the code for an example DF

Date <- Date <- as.Date(c('2021-01-01','2021-01-01','2021-01-01', '2021-01-03','2021-01-03','2021-01-03', '2021-01-04','2021-01-04','2021-01-04', '2021-02-02','2021-02-02','2021-02-02', '2021-02-03','2021-02-03','2021-02-03', '2021-03-01','2021-03-01','2021-03-01'))
Name <- c("John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark","John Smith", "Jane Peters", "Jim Clark")
Hours <- c(floor(runif(18, min=0, max = 14)))
data.frame(Date, Name, Hours)

With the above dataframe looking like this

         Date        Name Hours
1  2021-01-01  John Smith    11
2  2021-01-01 Jane Peters     9
3  2021-01-01   Jim Clark     6
4  2021-01-03  John Smith     7
5  2021-01-03 Jane Peters     1
6  2021-01-03   Jim Clark     9
7  2021-01-04  John Smith     2
8  2021-01-04 Jane Peters     4
9  2021-01-04   Jim Clark    10
10 2021-02-02  John Smith     2
11 2021-02-02 Jane Peters     1
12 2021-02-02   Jim Clark     3
13 2021-02-03  John Smith     8
14 2021-02-03 Jane Peters     7
15 2021-02-03   Jim Clark     0
16 2021-03-01  John Smith    11
17 2021-03-01 Jane Peters     6
18 2021-03-01   Jim Clark     8

To illustrate, I would like to add a day where the "Hours" is 0 for each day for each person where there is no data.

The end result would look something like this for the first few days in January, but extending until the end date at the beginning of March.

1  2021-01-01  John Smith    11
2  2021-01-01 Jane Peters     9
3  2021-01-01   Jim Clark     6
4  2021-01-02  John Smith     0
5  2021-01-02 Jane Peters     0
6  2021-01-02   Jim Clark     0
4  2021-01-03  John Smith     7
5  2021-01-03 Jane Peters     1
6  2021-01-03   Jim Clark     9
7  2021-01-04  John Smith     2
8  2021-01-04 Jane Peters     4
9  2021-01-04   Jim Clark    10
10 2021-01-05  John Smith     0
11 2021-01-05 Jane Peters     0
12 2021-01-05   Jim Clark     0

This way, I will be able to add the days where there is no data as 0's into my rolling average calculation.

CodePudding user response:

It sounds like you are looking for the tidyr::complete function. That would look like this

library(tidyr)
complete(mydata, Date=seq(min(Date), max(Date), by="1 day"), Name, fill=list(Hours=0))

CodePudding user response:

Just with base R you could merge your data to an expand.grid.

res <- merge(dat, 
             expand.grid(
               Date=seq(as.Date('2021-01-01'), as.Date('2021-03-01'), 'days'),
               Name=unique(dat$Name)),
             all=TRUE)
res <- transform(res, Hours=replace(Hours, is.na(Hours),  0)) 
head(res)
#         Date        Name Hours
# 1 2021-01-01 Jane Peters     9
# 2 2021-01-01   Jim Clark     6
# 3 2021-01-01  John Smith    11
# 4 2021-01-02 Jane Peters     0
# 5 2021-01-02   Jim Clark     0
# 6 2021-01-02  John Smith     0

Data

dat <- structure(list(Date = structure(c(18628, 18628, 18628, 18630, 
18630, 18630, 18631, 18631, 18631, 18660, 18660, 18660, 18661, 
18661, 18661, 18687, 18687, 18687), class = "Date"), Name = c("John Smith", 
"Jane Peters", "Jim Clark", "John Smith", "Jane Peters", "Jim Clark", 
"John Smith", "Jane Peters", "Jim Clark", "John Smith", "Jane Peters", 
"Jim Clark", "John Smith", "Jane Peters", "Jim Clark", "John Smith", 
"Jane Peters", "Jim Clark"), Hours = c(11L, 9L, 6L, 7L, 1L, 9L, 
2L, 4L, 10L, 2L, 1L, 3L, 8L, 7L, 0L, 11L, 6L, 8L)), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18"), class = "data.frame")
  • Related