Home > Enterprise >  How to calculate a value row that is based on the previous value in the same column (in R)?
How to calculate a value row that is based on the previous value in the same column (in R)?

Time:08-27

For a time series analysis, I have a data frame that is based on periods. I have a column with starters and a column with patients who discontinue therapy. Now I want to calculate the current users of therapy per period. I calculated the difference between 'starters' and 'discontinue' as an intermediate step.

> df <- data.frame(
period = c(1,2,3,4,5,6), 
starters = c(595, 113, 36, 489, 28, 101),
discontinue = c(0, 11, 6, 8, 14, 8))

> df$difference <- df$starters - df$discontinue
> df

  period starters discontinue difference
1      1      595           0        595
2      2      113          11        102
3      3       36           6         30
4      4      489           8        481
5      5       28          14         14
6      6      101           8         93

To calculate current users, I would like to add a column 'current.users' with initial value = df$difference[1] and adds the difference each row. Output should look like:

> df
  period starters discontinue difference current.users
1      1      595           0        595           595
2      2      113          11        102           697
3      3       36           6         30           727
4      4      489           8        481          1208
5      5       28          14         14          1222
6      6      101           8         93          1315

I tried to use for loops and data.table, but I can not calculate the next value in the column based on the previous value in the same column. Does anyone know the correct code for this issue? Thanks in advance!

CodePudding user response:

You can do a simple cumsum like this:

df <- data.frame(
  period = c(1,2,3,4,5,6), 
  starters = c(595, 113, 36, 489, 28, 101),
  discontinue = c(0, 11, 6, 8, 14, 8))

df$difference <- df$starters - df$discontinue
# cumsum
df$current.users <- cumsum(df$difference)
df
#>   period starters discontinue difference current.users
#> 1      1      595           0        595           595
#> 2      2      113          11        102           697
#> 3      3       36           6         30           727
#> 4      4      489           8        481          1208
#> 5      5       28          14         14          1222
#> 6      6      101           8         93          1315

Created on 2022-08-26 with reprex v2.0.2

CodePudding user response:

With dplyr

library(tidyverse)

df %>%  
  mutate(difference = starters - discontinue, 
         current.users = cumsum(difference))

  period starters discontinue difference current.users
1      1      595           0        595           595
2      2      113          11        102           697
3      3       36           6         30           727
4      4      489           8        481          1208
5      5       28          14         14          1222
6      6      101           8         93          1315

Without the difference column

df %>%  
  mutate(current.users = cumsum(starters - discontinue))

  period starters discontinue current.users
1      1      595           0           595
2      2      113          11           697
3      3       36           6           727
4      4      489           8          1208
5      5       28          14          1222
6      6      101           8          1315
  • Related