I have a problem. I have the following data frame.

1	2
NA	100
1.00499	NA
1.00813	NA
0.99203	NA

Two columns. In the second column, apart from the starting value, there are only NAs. I want to fill the first NA of the 2nd column by multiplying the 1st value from column 2 with the 2nd value from column 1 (100* 1.00499). The 3rd value of column 2 should be the product of the 2nd new created value in column 2 and the 3rd value in column 1 and so on. So that at the end the NAs are replaced by values.

These two sources have helped me understand how to refer to different rows. But in both cases a new column is created.I don't want that. I want to fill the already existing column 2.

Use a value from the previous row in an R data.table calculation

https://statisticsglobe.com/use-previous-row-of-data-table-in-r

Can anyone help me?

Thanks so much in advance.

Sample code

library(quantmod)
data.N225<-getSymbols("^N225",from="1965-01-01", to="2022-03-30", auto.assign=FALSE, src='yahoo') 
data.N225[c(1:3, nrow(data.N225)),]

data.N225<- na.omit(data.N225)
N225 <- data.N225[,6]


N225$DiskreteRendite= Delt(N225$N225.Adjusted)
N225[c(1:3,nrow(N225)),]

options(digits=5)
N225.diskret <- N225[,3]
N225.diskret[c(1:3,nrow(N225.diskret)),]

N225$diskretplus1 <- N225$DiskreteRendite 1
N225[c(1:3,nrow(N225)),]

library(dplyr)
N225$normiert <-"Value"
N225$normiert[1,] <-100
N225[c(1:3,nrow(N225)),]

N225.new <- N225[,4:5]
N225.new[c(1:3,nrow(N225.new)),]

Here is the code to create the data frame in R studio.

a <- c(NA, 1.0050,1.0081, 1.0095, 1.0016,0.9947)
b <- c(100, NA, NA, NA, NA, NA)
c<- data.frame(ONE = a, TWO=b)

CodePudding user response：

You could use cumprod for cummulative product

transform(
    df,
    TWO = cumprod(c(na.omit(TWO),na.omit(ONE)))
)

which yields

     ONE      TWO
1     NA 100.0000
2 1.0050 100.5000
3 1.0081 101.3140
4 1.0095 102.2765
5 1.0016 102.4402
6 0.9947 101.8972

data

> dput(df)
structure(list(ONE = c(NA, 1.005, 1.0081, 1.0095, 1.0016, 0.9947
), TWO = c(100, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))

CodePudding user response：

What about (gasp) a for loop?

(I'll use dat instead of c for your dataframe to avoid confusion with function c()).

for (row in 2:nrow(dat)) {
        if (!is.na(dat$TWO[row-1])) {
                dat$TWO[row] <- dat$ONE[row] * dat$TWO[row-1]
        }
}

This means: For each row from the second to the end, if the TWO in the previous row is not a missing value, calculate the TWO in this row by multiplying ONE in the current row and TWO from the previous row.

Output:

#>      ONE      TWO
#> 1     NA 100.0000
#> 2 1.0050 100.5000
#> 3 1.0081 101.3140
#> 4 1.0095 102.2765
#> 5 1.0016 102.4402
#> 6 0.9947 101.8972

^{Created on 2022-04-28 by the reprex package (v2.0.1)}

I'd love to read a dplyr solution!