I am new to R and would really appreciate your help.
I have a panel dataset (firm,year) and other variables (number,X, and Y). X for all years is given, but I only have Y for the first year for each firm. I am interested in generating/replacing the values of Y for those periods where Y=NA
The formular for Y is given below.
Yt= Xt 0.85*Yt-1. i.e current value of Y depends on X as well as previous value of Y.
My question is how do I replace subsequent values of Y.
E.g
data<-data.frame(id=c(rep(1,3),rep(2,3)), year=c(2000:2002,2002:2004),number=rep(1:3,2), X=c(10,20,30,20,40,10), Y=c(30,NA,NA,50,NA,NA))
I have tried the following code but for some reason its not using the lag value of Y in the calculation.
library(dplyr)
library(plm)
data<- data %>%
group_by(id)%>%
mutate(Y = replace(Y, number!=1, 0.85*lag(data$Y) X))
I hope to get the following result.
data<-data.frame(id=c(rep(1,3),rep(2,3)), year=c(2000:2002,2002:2004),number=rep(1:3,2), X=c(10,20,30,20,40,10), Y=c(30,45.5,68.675,50,82.5,80.125))
Can anyone tell me how to do this correctly?
CodePudding user response:
This is one of those cases where a simple loop works.
for (i in 1:nrow(data)) {
if(is.na(data$Y[i])) {
data$Y[i] <- 0.85*data$Y[i-1] data$X[i]
}
}
data
CodePudding user response:
We may use accumulate
here
library(dplyr)
library(purrr)
data %>%
group_by(id) %>%
mutate(Y = accumulate(X[-1], ~ 0.85 * .x .y, .init = first(Y))) %>%
ungroup
-output
# A tibble: 6 × 5
id year number X Y
<dbl> <int> <int> <dbl> <dbl>
1 1 2000 1 10 30
2 1 2001 2 20 45.5
3 1 2002 3 30 68.7
4 2 2002 1 20 50
5 2 2003 2 40 82.5
6 2 2004 3 10 80.1
data
data <- structure(list(id = c(1, 1, 1, 2, 2, 2), year = c(2000L, 2001L,
2002L, 2002L, 2003L, 2004L), number = c(1L, 2L, 3L, 1L, 2L, 3L
), X = c(10, 20, 30, 20, 40, 10), Y = c(30, NA, NA, 50, NA, NA
)), class = "data.frame", row.names = c(NA, -6L))