Home > OS >  How do I fix a cumulative sum in R when dealing with negative values
How do I fix a cumulative sum in R when dealing with negative values

Time:08-03

I have data based on a user's behavior on chess.com. I'm trying to get data on total puzzles done by a user. Unfortunately, players can reset their puzzles, causing their overall number of puzzles completed (LifeTimeCumulative) to go down. It can go all the way to zero, or to any other number. My data looks like the data below. How do I go from the LifetimeCumulative Column, where sometimes the number goes down, to LifeTimeCumulativeCorrect. Notice that in LifeTimeCumulativeCorrect, if the LifeTimeCumulative goes down, then LifeTimeCumulativeCorrect does not go down. Basically, LifeTimeCumulativeCorrect is summing only the positive increases in LifeTimeCumulative.

library(dplyr)
library(lubridate)

set.seed(1)

observations = 100

dateStart = as.Date("2022-1-1")
dateEnd   = as.Date("2022-1-04")
dates = seq( dateStart, dateEnd, by = "day")

players = c("A","B","C")

df = expand.grid( player=players, date=dates )

df = df %>% 
  arrange( player, date ) %>% 
  mutate( LifeTimeCumulative = sample.int( 3, nrow(df), replace = TRUE) - 1 )


LifeTimeCumulativeCorrect <- c(0,2,2,3,0,2,2,2,1,2,2,2)

df$LifeTimeCumulativeCorrect <- LifeTimeCumulativeCorrect


Data

CodePudding user response:

We can calculate the increments with diff and use pmax to set all negative increments to 0. Then we cumsum this corrected increment:

df %>%
  group_by(player) %>%
  mutate(
    increment = c(first(LifeTimeCumulative), pmax(diff(LifeTimeCumulative), 0)),
    corrected = cumsum(increment)
  ) %>%
  ungroup()
# # A tibble: 12 × 6
#    player date       LifeTimeCumulative LifeTimeCumulativeCorrect increment corrected
#    <fct>  <date>                  <dbl>                     <dbl>     <dbl>     <dbl>
#  1 A      2022-01-01                  0                         0         0         0
#  2 A      2022-01-02                  2                         2         2         2
#  3 A      2022-01-03                  0                         2         0         2
#  4 A      2022-01-04                  1                         3         1         3
#  5 B      2022-01-01                  0                         0         0         0
#  6 B      2022-01-02                  2                         2         2         2
#  7 B      2022-01-03                  2                         2         0         2
#  8 B      2022-01-04                  1                         2         0         2
#  9 C      2022-01-01                  1                         1         1         1
# 10 C      2022-01-02                  2                         2         1         2
# 11 C      2022-01-03                  2                         2         0         2
# 12 C      2022-01-04                  0                         2         0         2  
  • Related