Home > Software design >  Creating a sequence along dates grouped by a factor, but repeating the same sequence when dates repe
Creating a sequence along dates grouped by a factor, but repeating the same sequence when dates repe

Time:01-05

I have a table, say...

df <- data.frame(
name = c('Jack', 'Jack', 'Jack', 'Jill', 'Jill', 'Jill', 'Jill'),
date = c('01-01-2023', '01-01-2023', '01-02-2023', '01-01-2023', '01-02-2023', '01-02-2023', '01-03-2023') )

Jack, 01-01-2023
Jack, 01-01-2023
Jack, 01-02-2023
Jill, 01-01-2023
Jill, 01-02-2023
Jill, 01-02-2023
Jill, 01-03-2023

And I can create a simple sequence along dates by name:

df %\>%  
group_by(name) %\>%  
mutate(date_sequence = seq(1,n(),1)) %\>%  
ungroup()

To get:

Jack, 01-01-2023, 1
Jack, 01-01-2023, 2
Jack, 01-02-2023, 3
Jill, 01-01-2023, 1
Jill, 01-02-2023, 2
Jill, 01-02-2023, 3
Jill, 01-03-2023, 4

But what I really want is:

Jack, 01-01-2023, 1
Jack, 01-01-2023, 1
Jack, 01-02-2023, 2
Jill, 01-01-2023, 1
Jill, 01-02-2023, 2
Jill, 01-02-2023, 2
Jill, 01-03-2023, 3

Where the sequence only grows if the date changes. Ideally if it goes up, but IRL I've already sorted the data.

I have tried pasting name and date, then grouping on that; grouping by name and date; got desperate and tried a few different combinations of nesting grouping and mutate sort of blindly; started thinking about a series of loops and then decided to come here.

CodePudding user response:

We can use match instead

library(dplyr)
df %>% 
   group_by(name) %>% 
   mutate(date_sequence = match(date, unique(date))) %>% 
   ungroup

-output

# A tibble: 7 × 3
  name  date       date_sequence
  <chr> <chr>              <int>
1 Jack  01-01-2023             1
2 Jack  01-01-2023             1
3 Jack  01-02-2023             2
4 Jill  01-01-2023             1
5 Jill  01-02-2023             2
6 Jill  01-02-2023             2
7 Jill  01-03-2023             3

CodePudding user response:

An alternative is rle.id from data.table:

df %>%
  group_by(name) %>%
  mutate(date_sequence = data.table::rleid(date))
#> # A tibble: 7 x 3
#> # Groups:   name [2]
#>   name  date       date_sequence
#>   <chr> <chr>              <int>
#> 1 Jack  01-01-2023             1
#> 2 Jack  01-01-2023             1
#> 3 Jack  01-02-2023             2
#> 4 Jill  01-01-2023             1
#> 5 Jill  01-02-2023             2
#> 6 Jill  01-02-2023             2
#> 7 Jill  01-03-2023             3

Created on 2023-01-04 with reprex v2.0.2

CodePudding user response:

I remembered hitting this problem before and dug out an old code chunk while trying to format this question. I decided to put this out there to a) feel better about how long I spent formulating/formatting this question and b) maybe help somebody else.

df %>%
  group_by(name) %>%
  mutate(entry_seq = cumsum(date != lag(date, default = first(date)))   1)

Worked like a charm. But I'd still like to see other approaches if people have them.

Apologies for the slashes in the pipes of my original post. I didn't realize they were added while I was messing around with formatting.

  • Related