I have a dataframe, and I want to do some calculations depending on the previous rows (like dragging informations down in excel). My DF looks like this:
set.seed(1234)
df <- data.frame(DA = sample(1:3, 6, rep = TRUE) ,HB = sample(0:600, 6, rep = TRUE), D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE), GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0 )
df$GL[1] = 646
df$R[1] = 60
df$DA[5] = 2
df
# DA HB D AD GM GL R RM
# 1 2 399 4 13 30 646 60 0
# 2 2 97 4 10 31 NA NA 0
# 3 1 102 5 5 31 NA NA 0
# 4 3 325 4 2 31 NA NA 0
# 5 2 78 3 14 30 NA NA 0
# 6 1 269 4 8 30 NA NA 0
I want to fill out the missing values in my GL, R and RM columns, and the values are dependent on each other. So eg.
attach(df)
#calc GL and R for the 2nd row
df$GL[2] <- GL[1] HB[2] RM[1]
df$R[2] <- df$GL[2]*D[2]/GM[2]*AD[2]
#calc GL and R for the 3rd row
df$GL[3] <- df$GL[2] HB[3] df$RM[2]
df$R[3] <-df$GL[3]*D[3]/GM[3]*AD[3]
#and so on..
Is there a way to do all the calculations at once, instead of row by row?
In addition, each time the column 'DA' = 2, the previous values for 'R' should be summed up for the same row for 'RM', but only from the last occurence. So that
attach(df)
df$RM[3] <-R[1] R[2] R[3]
#and RM for the 6th row is calculated by
#df$RM[6] <-R[4] R[5] R[6]
Thanks a lot in advance!
CodePudding user response:
You can use a for
loop to calculate GL
values and once you have them you can do the calculation for R
columns directly.
for(i in 2:nrow(df)) {
df$GL[i] <- with(df, GL[i-1] HB[i] RM[i-1])
}
df$R <- with(df, (GL* D)/(GM *AD))
CodePudding user response:
You can use indexing to solve the first two problems:
> # Original code from question~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> set.seed(1234)
> df <- data.frame(DA = sample(1:3, 6, rep = TRUE), HB = sample(0:600, 6, rep = TRUE),
D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE),
GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0 )
> df$GL[1] = 646
> df$R[1] = 60
> df$DA[5] = 2
> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> # View df
> df
DA HB D AD GM GL R RM
1 2 399 4 13 30 646 60 0
2 2 97 4 10 31 NA NA 0
3 1 102 5 5 31 NA NA 0
4 3 325 4 2 31 NA NA 0
5 2 78 3 14 30 NA NA 0
6 1 269 4 8 30 NA NA 0
> # Solution below, based on indexing
> # 1. GL column
> df$GL <- cumsum(c(df$GL[1], df$HB[-1] df$RM[-nrow(df)]))
> # 2. R column
> df$R[-1] <- (df$GL * df$D / df$GM * df$AD)[-1]
> # May be more clear like this (same result)
> df$R[-1] <- df$GL[-1] * df$D[-1] / df$GM[-1] * df$AD[-1]
> # Or did you mean this for last *?
> df$R[-1] <- (df$GL * df$D / (df$GM * df$AD))[-1]
The third problem can be solved with a loop.
> df$RM[1] <- df$R[1]
> for (i in 2:nrow(df)) {
df$RM[i] <- df$R[i] df$RM[i-1] * (df$DA[i] != 2)
}
> df
DA HB D AD GM GL R RM
1 2 399 4 13 30 646 60.000000 60.000000
2 2 97 4 10 31 743 9.587097 9.587097
3 1 102 5 5 31 845 27.258065 36.845161
4 3 325 4 2 31 1170 75.483871 112.329032
5 2 78 3 14 30 1248 8.914286 8.914286
6 1 269 4 8 30 1517 25.283333 34.197619
Do these results look correct?