Home > Enterprise >  Impute missing values for missing dates
Impute missing values for missing dates

Time:10-18

Imagine I have the following two data frames:

> sp
        date value
1 2004-08-20     1
2 2004-08-23     2
3 2004-08-24     4
4 2004-08-25     5
5 2004-08-26    10
6 2004-08-27    11

> other
        date value
1 2004-08-20     2
2 2004-08-23     4
3 2004-08-24     5
4 2004-08-25    10
5 2004-08-27    11

where the first columns represents the dates and the second the values for each day. The matrix of reference is sp and I want to impute to the matrix other the missing dates and values with respect to sp. For instance, in this case I miss the date "2004-08-26" in the matrix other. I should add to the matrix other a new row, with the date "2004-08-26" and the value which is given by the mean of the values at "2004-08-25" and "2004-08-27".

Could anyone suggest me how I can do it?

Data

sp <- data.frame(date=c("2004-08-20", "2004-08-23", "2004-08-24", "2004-08-25",
                        "2004-08-26", "2004-08-27"), value=c(1, 2, 4, 5, 10, 11))

other <- data.frame(date=c("2004-08-20", "2004-08-23", "2004-08-24", "2004-08-25",
                           "2004-08-27"), value=c(2, 4, 5, 10, 11))

CodePudding user response:

If I understand correctly, you want to add dates from sp that are missing in other.

You can merge other with just the date column of sp. Note, that by default from one-column data frames (and matrices) dimensions are dropped, so we need drop=FALSE.

The resulting NA can be e.g. linearly interpolated using approx, which gives the desired mean.

other2 <- merge(other, sp[, 'date', drop=FALSE], all=TRUE) |>
  transform(value=approx(value, xout=seq_along(value))$y)
other2
#         date value
# 1 2004-08-20   2.0
# 2 2004-08-23   4.0
# 3 2004-08-24   5.0
# 4 2004-08-25  10.0
# 5 2004-08-26  10.5  ## interpolated
# 6 2004-08-27  11.0

CodePudding user response:

An option using zoo::na.approx :

library(dplyr)

sp %>%
  select(date) %>%
  left_join(other, by = 'date') %>%
  mutate(value = zoo::na.approx(value))

#        date value
#1 2004-08-20   2.0
#2 2004-08-23   4.0
#3 2004-08-24   5.0
#4 2004-08-25  10.0
#5 2004-08-26  10.5
#6 2004-08-27  11.0
  • Related