Imagine I have the following two data frames:
> sp
date value
1 2004-08-20 1
2 2004-08-23 2
3 2004-08-24 4
4 2004-08-25 5
5 2004-08-26 10
6 2004-08-27 11
> other
date value
1 2004-08-20 2
2 2004-08-23 4
3 2004-08-24 5
4 2004-08-25 10
5 2004-08-27 11
where the first columns represents the dates and the second the values for each day. The matrix of reference is sp
and I want to impute to the matrix other the missing dates and values with respect to sp
. For instance, in this case I miss the date "2004-08-26"
in the matrix other
. I should add to the matrix other a new row, with the date "2004-08-26"
and the value which is given by the mean of the values at "2004-08-25"
and "2004-08-27"
.
Could anyone suggest me how I can do it?
Data
sp <- data.frame(date=c("2004-08-20", "2004-08-23", "2004-08-24", "2004-08-25",
"2004-08-26", "2004-08-27"), value=c(1, 2, 4, 5, 10, 11))
other <- data.frame(date=c("2004-08-20", "2004-08-23", "2004-08-24", "2004-08-25",
"2004-08-27"), value=c(2, 4, 5, 10, 11))
CodePudding user response:
If I understand correctly, you want to add dates from sp
that are missing in other
.
You can merge
other
with just the date
column of sp
. Note, that by default from one-column data frames (and matrices) dimensions are dropped, so we need drop=FALSE
.
The resulting NA
can be e.g. linearly interpolated using approx
, which gives the desired mean.
other2 <- merge(other, sp[, 'date', drop=FALSE], all=TRUE) |>
transform(value=approx(value, xout=seq_along(value))$y)
other2
# date value
# 1 2004-08-20 2.0
# 2 2004-08-23 4.0
# 3 2004-08-24 5.0
# 4 2004-08-25 10.0
# 5 2004-08-26 10.5 ## interpolated
# 6 2004-08-27 11.0
CodePudding user response:
An option using zoo::na.approx
:
library(dplyr)
sp %>%
select(date) %>%
left_join(other, by = 'date') %>%
mutate(value = zoo::na.approx(value))
# date value
#1 2004-08-20 2.0
#2 2004-08-23 4.0
#3 2004-08-24 5.0
#4 2004-08-25 10.0
#5 2004-08-26 10.5
#6 2004-08-27 11.0