I have a table, say...
df <- data.frame(
name = c('Jack', 'Jack', 'Jack', 'Jill', 'Jill', 'Jill', 'Jill'),
date = c('01-01-2023', '01-01-2023', '01-02-2023', '01-01-2023', '01-02-2023', '01-02-2023', '01-03-2023') )
Jack, 01-01-2023
Jack, 01-01-2023
Jack, 01-02-2023
Jill, 01-01-2023
Jill, 01-02-2023
Jill, 01-02-2023
Jill, 01-03-2023
And I can create a simple sequence along dates by name:
df %\>%
group_by(name) %\>%
mutate(date_sequence = seq(1,n(),1)) %\>%
ungroup()
To get:
Jack, 01-01-2023, 1
Jack, 01-01-2023, 2
Jack, 01-02-2023, 3
Jill, 01-01-2023, 1
Jill, 01-02-2023, 2
Jill, 01-02-2023, 3
Jill, 01-03-2023, 4
But what I really want is:
Jack, 01-01-2023, 1
Jack, 01-01-2023, 1
Jack, 01-02-2023, 2
Jill, 01-01-2023, 1
Jill, 01-02-2023, 2
Jill, 01-02-2023, 2
Jill, 01-03-2023, 3
Where the sequence only grows if the date changes. Ideally if it goes up, but IRL I've already sorted the data.
I have tried pasting name and date, then grouping on that; grouping by name and date; got desperate and tried a few different combinations of nesting grouping and mutate sort of blindly; started thinking about a series of loops and then decided to come here.
CodePudding user response:
We can use match
instead
library(dplyr)
df %>%
group_by(name) %>%
mutate(date_sequence = match(date, unique(date))) %>%
ungroup
-output
# A tibble: 7 × 3
name date date_sequence
<chr> <chr> <int>
1 Jack 01-01-2023 1
2 Jack 01-01-2023 1
3 Jack 01-02-2023 2
4 Jill 01-01-2023 1
5 Jill 01-02-2023 2
6 Jill 01-02-2023 2
7 Jill 01-03-2023 3
CodePudding user response:
An alternative is rle.id
from data.table:
df %>%
group_by(name) %>%
mutate(date_sequence = data.table::rleid(date))
#> # A tibble: 7 x 3
#> # Groups: name [2]
#> name date date_sequence
#> <chr> <chr> <int>
#> 1 Jack 01-01-2023 1
#> 2 Jack 01-01-2023 1
#> 3 Jack 01-02-2023 2
#> 4 Jill 01-01-2023 1
#> 5 Jill 01-02-2023 2
#> 6 Jill 01-02-2023 2
#> 7 Jill 01-03-2023 3
Created on 2023-01-04 with reprex v2.0.2
CodePudding user response:
I remembered hitting this problem before and dug out an old code chunk while trying to format this question. I decided to put this out there to a) feel better about how long I spent formulating/formatting this question and b) maybe help somebody else.
df %>%
group_by(name) %>%
mutate(entry_seq = cumsum(date != lag(date, default = first(date))) 1)
Worked like a charm. But I'd still like to see other approaches if people have them.
Apologies for the slashes in the pipes of my original post. I didn't realize they were added while I was messing around with formatting.