Home > OS >  How to reset a sequence in a R dataframe's column, based on the levels of a different column
How to reset a sequence in a R dataframe's column, based on the levels of a different column

Time:12-07

I have data that logs subject and session information in columns like so:

> Subject <- c(rep("A", 6), rep("B", 7))
> Session <- c(rep(1, 3), rep(2, 3), rep(8, 4), rep(9, 3))
> df <- data.frame(Subject, Session)
> df
   Subject Session
1        A       1
2        A       1
3        A       1
4        A       2
5        A       2
6        A       2
7        B       8
8        B       8
9        B       8
10       B       8
11       B       9
12       B       9
13       B       9

The problem is that the raw data does not always have consistent session numbering (hence why subject B's first session starts at session 8 and not 1). Ideally, it would be helpful to have a column that resets the session numbering when each new subject appears. Like this:

   Subject Session New_Sess
1        A       1        1
2        A       1        1
3        A       1        1
4        A       2        2
5        A       2        2
6        A       2        2
7        B       8        1
8        B       8        1
9        B       8        1
10       B       8        1
11       B       9        2
12       B       9        2
13       B       9        2

I've attempted a solution with nested for loops and could not manage to get that to work. I suspect there must be a straightforward solution to this (it doesn't seem like it should be that difficult) but despite my best efforts, it has evaded me completely.

The actual dataset is quite large and cumbersome, so an automated solution to this problem would be helpful.

CodePudding user response:

We may use match

library(dplyr)
df <- df %>%
   group_by(Subject) %>% 
   mutate(New_Sess = match(Session, unique(Session))) %>%
   ungroup

-output

df
# A tibble: 13 × 3
   Subject Session New_Sess
   <chr>     <dbl>    <int>
 1 A             1        1
 2 A             1        1
 3 A             1        1
 4 A             2        2
 5 A             2        2
 6 A             2        2
 7 B             8        1
 8 B             8        1
 9 B             8        1
10 B             8        1
11 B             9        2
12 B             9        2
13 B             9        2

Or with factor

df <-  df %>%
     group_by(Subject) %>% 
     mutate(New_Sess = as.integer(factor(Session))) %>% 
     ungroup
  • Related