I have data based on a user's behavior on chess.com. I'm trying to get data on total puzzles done by a user. Unfortunately, players can reset their puzzles, causing their overall number of puzzles completed (LifeTimeCumulative) to go down. It can go all the way to zero, or to any other number. My data looks like the data below. How do I go from the LifetimeCumulative Column, where sometimes the number goes down, to LifeTimeCumulativeCorrect. Notice that in LifeTimeCumulativeCorrect, if the LifeTimeCumulative goes down, then LifeTimeCumulativeCorrect does not go down. Basically, LifeTimeCumulativeCorrect is summing only the positive increases in LifeTimeCumulative.
library(dplyr)
library(lubridate)
set.seed(1)
observations = 100
dateStart = as.Date("2022-1-1")
dateEnd = as.Date("2022-1-04")
dates = seq( dateStart, dateEnd, by = "day")
players = c("A","B","C")
df = expand.grid( player=players, date=dates )
df = df %>%
arrange( player, date ) %>%
mutate( LifeTimeCumulative = sample.int( 3, nrow(df), replace = TRUE) - 1 )
LifeTimeCumulativeCorrect <- c(0,2,2,3,0,2,2,2,1,2,2,2)
df$LifeTimeCumulativeCorrect <- LifeTimeCumulativeCorrect
CodePudding user response:
We can calculate the increments with diff
and use pmax
to set all negative increments to 0. Then we cumsum
this corrected increment:
df %>%
group_by(player) %>%
mutate(
increment = c(first(LifeTimeCumulative), pmax(diff(LifeTimeCumulative), 0)),
corrected = cumsum(increment)
) %>%
ungroup()
# # A tibble: 12 × 6
# player date LifeTimeCumulative LifeTimeCumulativeCorrect increment corrected
# <fct> <date> <dbl> <dbl> <dbl> <dbl>
# 1 A 2022-01-01 0 0 0 0
# 2 A 2022-01-02 2 2 2 2
# 3 A 2022-01-03 0 2 0 2
# 4 A 2022-01-04 1 3 1 3
# 5 B 2022-01-01 0 0 0 0
# 6 B 2022-01-02 2 2 2 2
# 7 B 2022-01-03 2 2 0 2
# 8 B 2022-01-04 1 2 0 2
# 9 C 2022-01-01 1 1 1 1
# 10 C 2022-01-02 2 2 1 2
# 11 C 2022-01-03 2 2 0 2
# 12 C 2022-01-04 0 2 0 2