How can I create a variable that depends on its previous value as well as another variable in R-CodePudding

I am new to R and would really appreciate your help.

I have a panel dataset (firm,year) and other variables (number,X, and Y). X for all years is given, but I only have Y for the first year for each firm. I am interested in generating/replacing the values of Y for those periods where Y=NA

The formular for Y is given below.

Yt= Xt 0.85*Yt-1. i.e current value of Y depends on X as well as previous value of Y.

My question is how do I replace subsequent values of Y.

E.g

data<-data.frame(id=c(rep(1,3),rep(2,3)), year=c(2000:2002,2002:2004),number=rep(1:3,2), X=c(10,20,30,20,40,10), Y=c(30,NA,NA,50,NA,NA))

I have tried the following code but for some reason its not using the lag value of Y in the calculation.

library(dplyr)
library(plm)

data<- data %>%
  group_by(id)%>%
  mutate(Y = replace(Y, number!=1, 0.85*lag(data$Y)  X))

I hope to get the following result.

data<-data.frame(id=c(rep(1,3),rep(2,3)), year=c(2000:2002,2002:2004),number=rep(1:3,2), X=c(10,20,30,20,40,10), Y=c(30,45.5,68.675,50,82.5,80.125))

Can anyone tell me how to do this correctly?

CodePudding user response：

This is one of those cases where a simple loop works.

for (i in 1:nrow(data)) {
   if(is.na(data$Y[i])) {
      data$Y[i] <- 0.85*data$Y[i-1]   data$X[i]
   }
}
data

CodePudding user response：

We may use accumulate here

library(dplyr)
library(purrr)
data %>% 
   group_by(id) %>% 
   mutate(Y = accumulate(X[-1], ~ 0.85 * .x   .y, .init = first(Y))) %>% 
   ungroup

-output

# A tibble: 6 × 5
     id  year number     X     Y
  <dbl> <int>  <int> <dbl> <dbl>
1     1  2000      1    10  30  
2     1  2001      2    20  45.5
3     1  2002      3    30  68.7
4     2  2002      1    20  50  
5     2  2003      2    40  82.5
6     2  2004      3    10  80.1

data

data <- structure(list(id = c(1, 1, 1, 2, 2, 2), year = c(2000L, 2001L, 
2002L, 2002L, 2003L, 2004L), number = c(1L, 2L, 3L, 1L, 2L, 3L
), X = c(10, 20, 30, 20, 40, 10), Y = c(30, NA, NA, 50, NA, NA
)), class = "data.frame", row.names = c(NA, -6L))