Home > Net >  R: How to fill out values in a DF which are dependent on previous rows
R: How to fill out values in a DF which are dependent on previous rows

Time:09-22

I have a dataframe, and I want to do some calculations depending on the previous rows (like dragging informations down in excel). My DF looks like this:

set.seed(1234)
df <- data.frame(DA = sample(1:3, 6, rep = TRUE) ,HB = sample(0:600, 6, rep = TRUE), D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE), GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0  )
df$GL[1] = 646
df$R[1] = 60
df$DA[5] = 2

df
#   DA  HB D AD GM  GL  R RM
# 1  2 399 4 13 30 646 60  0
# 2  2  97 4 10 31  NA NA  0
# 3  1 102 5  5 31  NA NA  0
# 4  3 325 4  2 31  NA NA  0
# 5  2  78 3 14 30  NA NA  0
# 6  1 269 4  8 30  NA NA  0

I want to fill out the missing values in my GL, R and RM columns, and the values are dependent on each other. So eg.

attach(df)

#calc GL and R for the 2nd row

df$GL[2] <- GL[1] HB[2] RM[1]

df$R[2] <- df$GL[2]*D[2]/GM[2]*AD[2]

#calc GL and R for the 3rd row

df$GL[3] <- df$GL[2] HB[3] df$RM[2]
df$R[3] <-df$GL[3]*D[3]/GM[3]*AD[3]

#and so on..

Is there a way to do all the calculations at once, instead of row by row?

In addition, each time the column 'DA' = 2, the previous values for 'R' should be summed up for the same row for 'RM', but only from the last occurence. So that

attach(df)

df$RM[3] <-R[1] R[2] R[3]

#and RM for the 6th row is calculated by

#df$RM[6] <-R[4] R[5] R[6]

Thanks a lot in advance!

CodePudding user response:

You can use a for loop to calculate GL values and once you have them you can do the calculation for R columns directly.

for(i in 2:nrow(df)) {
  df$GL[i] <- with(df, GL[i-1] HB[i] RM[i-1])
}
df$R <- with(df, (GL* D)/(GM *AD))

CodePudding user response:

You can use indexing to solve the first two problems:

> # Original code from question~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> set.seed(1234)
> df <- data.frame(DA = sample(1:3, 6, rep = TRUE), HB = sample(0:600, 6, rep = TRUE),
                   D = sample(1:5, 6, rep = TRUE), AD = sample(1:14, 6, rep = TRUE),
                   GM = sample(30:31, 6, rep = TRUE), GL = NA, R =NA, RM =0  )
> df$GL[1] = 646
> df$R[1] = 60
> df$DA[5] = 2
> #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> # View df
> df
  DA  HB D AD GM  GL  R RM
1  2 399 4 13 30 646 60  0
2  2  97 4 10 31  NA NA  0
3  1 102 5  5 31  NA NA  0
4  3 325 4  2 31  NA NA  0
5  2  78 3 14 30  NA NA  0
6  1 269 4  8 30  NA NA  0

> # Solution below, based on indexing
> # 1. GL column
> df$GL <- cumsum(c(df$GL[1], df$HB[-1]   df$RM[-nrow(df)]))

> # 2. R column
> df$R[-1] <- (df$GL * df$D / df$GM * df$AD)[-1]
> # May be more clear like this (same result)
> df$R[-1] <- df$GL[-1] * df$D[-1] / df$GM[-1] * df$AD[-1]
> # Or did you mean this for last *?
> df$R[-1] <- (df$GL * df$D / (df$GM * df$AD))[-1]

The third problem can be solved with a loop.

> df$RM[1] <- df$R[1]
> for (i in 2:nrow(df)) {
    df$RM[i] <- df$R[i]   df$RM[i-1] * (df$DA[i] != 2)
  }

> df
  DA  HB D AD GM   GL         R         RM
1  2 399 4 13 30  646 60.000000  60.000000
2  2  97 4 10 31  743  9.587097   9.587097
3  1 102 5  5 31  845 27.258065  36.845161
4  3 325 4  2 31 1170 75.483871 112.329032
5  2  78 3 14 30 1248  8.914286   8.914286
6  1 269 4  8 30 1517 25.283333  34.197619

Do these results look correct?

  • Related