I have a data set (ds
) with the variables date
, values
and break.
. I'd like to create a rule where a preserved de high value between two events of break == "True" and the other values must be 0.
In my example:
# Packages
library(dplyr)
library(lubridate)
# My data set
ds <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/accumulated_values_table")
ds$date <- ymd(ds$date)
ds <- ds %>%
mutate(break. = break. == "True")
str(ds)
# 'data.frame': 4458 obs. of 4 variables:
# $ id : int 0 1 2 2 1 0 3 2 4 1 ...
# $ date : Date, format: "2020-11-26" "2020-11-26" "2020-11-26" "2020-11-27" ...
# $ values: int 36 33 27 22 34 28 5 18 28 24 ...
# $ break.: logi FALSE FALSE FALSE FALSE TRUE FALSE ...
With this rule my desirable output for the ìd
==4 for example is the creation of values2 column below:
ds.id4 <- ds[ds$id==4,]
ds.id4 [1:11,]
# id date values break. values2
# 9 4 2020-11-28 28 FALSE 28
# 15 4 2020-11-29 19 FALSE 0
# 16 4 2020-11-30 21 FALSE 0
# 20 4 2020-12-01 24 FALSE 0
# 26 4 2020-12-02 20 TRUE 0
# 30 4 2020-12-03 8 FALSE 0
# 36 4 2020-12-04 11 TRUE 11
# 43 4 2020-12-05 0 FALSE 0
# 49 4 2020-12-06 0 FALSE 0
# 55 4 2020-12-07 1 FALSE 0
# 63 4 2020-12-08 3 TRUE 3
# ...
Please, could someone help me with it? Thanks in advance!
CodePudding user response:
You can create a new grouping variable with the help of cumsum
in each id
and keep the value
if it is max in the group.
library(dplyr)
result <- ds %>%
arrange(id) %>%
group_by(id, group = cumsum(lag(break., default = TRUE))) %>%
mutate(values2 = ifelse(values == max(values), values, 0)) %>%
ungroup()
Checking the output.
result %>% filter(id == 4) %>% head(12)
# id date values break. group values2
# <int> <date> <int> <lgl> <int> <dbl>
# 1 4 2020-11-28 28 FALSE 51 28
# 2 4 2020-11-29 19 FALSE 51 0
# 3 4 2020-11-30 21 FALSE 51 0
# 4 4 2020-12-01 24 FALSE 51 0
# 5 4 2020-12-02 20 TRUE 51 0
# 6 4 2020-12-03 8 FALSE 52 0
# 7 4 2020-12-04 11 TRUE 52 11
# 8 4 2020-12-05 0 FALSE 53 0
# 9 4 2020-12-06 0 FALSE 53 0
#10 4 2020-12-07 1 FALSE 53 0
#11 4 2020-12-08 3 TRUE 53 3
#12 4 2020-12-09 0 FALSE 54 0
You can drop the group
column if not needed with select(-group)
.