I've been a long time lurker here but finally my first question :) Here's what I would like to a achieve as a function in excel, but I can't seem to find a solution to do it in R.
This is what I tried to do but it does not seem to allow me to operate with the previous values of the new column I'm trying to make.
Here is a reproducible example:
library(dplyr)
set.seed(42) ## for sake of reproducibility
dat <- data.frame(date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"))
This would be the output of the dataframe:
dat
date
1 2020-12-26
2 2020-12-27
3 2020-12-28
4 2020-12-29
5 2020-12-30
6 2020-12-31
Desired output:
date periodNumber
1 2020-12-26 1
2 2020-12-27 2
3 2020-12-28 3
4 2020-12-29 4
5 2020-12-30 5
6 2020-12-31 6
My try at this:
dat %>%
mutate(periodLag = dplyr::lag(date)) %>%
mutate(periodNumber = ifelse(is.na(periodLag)==TRUE, 1,
ifelse(date == periodLag, dplyr::lag(periodNumber), (dplyr::lag(periodNumber) 1))))
Excel formula screenshot (https://i.ibb.co/FHq7sfL/screenshot.png)
Thanks for all the help! You all are the best!
CodePudding user response:
You could use dplyr
's cur_group_id()
:
library(dplyr)
set.seed(42)
# I used a larger example
dat <- data.frame(date=sample(seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"), size = 30, replace = TRUE))
dat %>%
arrange(date) %>% # needs sorting because of the random example
group_by(date) %>%
mutate(periodNumber = cur_group_id())
This returns
# A tibble: 30 x 2
# Groups: date [6]
date periodNumber
<date> <int>
1 2020-12-26 1
2 2020-12-26 1
3 2020-12-26 1
4 2020-12-26 1
5 2020-12-26 1
6 2020-12-26 1
7 2020-12-26 1
8 2020-12-26 1
9 2020-12-27 2
10 2020-12-27 2
11 2020-12-27 2
12 2020-12-27 2
13 2020-12-27 2
14 2020-12-27 2
15 2020-12-27 2
16 2020-12-28 3
17 2020-12-28 3
18 2020-12-28 3
19 2020-12-29 4
20 2020-12-29 4
21 2020-12-29 4
22 2020-12-29 4
23 2020-12-29 4
24 2020-12-29 4
25 2020-12-30 5
26 2020-12-30 5
27 2020-12-30 5
28 2020-12-30 5
29 2020-12-30 5
30 2020-12-31 6