I am working with the R programming language. I have a data frame that looks like this:
startdate <- c('2010-01-01','2010-01-01','2010-01-01', '2010-01-02','2010-01-03','2010-01-03')
event <- c(1,1,1,1,1,1)
my_data <- data.frame(startdate, event)
startdate event
1 2010-01-01 1
2 2010-01-01 1
3 2010-01-01 1
4 2010-01-02 1
5 2010-01-03 1
6 2010-01-03 1
Note: The actual value of "startdate" is "POSIXct" and is written as "year-month-date".
I am trying to take a cumulative sum of "event" according to the "startdate" column. The result should look like this
startdate <- c('2010-01-01', '2010-01-02' ,'2010-01-03')
event <- c(3,4,6)
my_data_2 <- data.frame(startdate, event)
#desired file
startdate event
1 2010-01-01 3
2 2010-01-02 4
3 2010-01-03 6
I tried to do this with the "dplyr" library:
library(dplyr)
new_file = my_data %>% group_by(startdate) %>% mutate(cumsum_value = cumsum(event))
But this returns something slightly different and non-intended:
startdate event cumsum_value
<chr> <dbl> <dbl>
1 2010-01-01 1 1
2 2010-01-01 1 2
3 2010-01-01 1 3
4 2010-01-02 1 1
5 2010-01-03 1 1
6 2010-01-03 1 2
Can someone please show me how to fix this?
Thanks
CodePudding user response:
my_data %>%
mutate(cumsum = cumsum(event)) %>%
group_by(startdate) %>%
summarise(max(cumsum))
# A tibble: 3 × 2
startdate `max(cumsum)`
<chr> <dbl>
1 2010-01-01 3
2 2010-01-02 4
3 2010-01-03 6
CodePudding user response:
mutate
theevent
column and calculatecumsum
group_by
startdate
andsummarise
max(event)
library(dplyr)
my_data %>%
mutate(event = cumsum(event)) %>%
group_by(startdate) %>%
summarise(event = max(event))
```
```
startdate event
<chr> <dbl>
1 2010-01-01 3
2 2010-01-02 4
3 2010-01-03 6
```
CodePudding user response:
Another option is also to make use of duplicated
and thus avoiding the group_by
. Also, if the 'event' column is just 1, instead of doing cumsum
, we could use the built-in function row_number()
to create a sequence
library(dplyr)
my_data %>%
mutate(event = row_number()) %>%
filter(!duplicated(startdate, fromLast = TRUE))