R: Add numeric sequence including negative values starting from middle of group-CodePudding

An example dataframe with 2 columns:

groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
index_ad <- c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0)

df <- data.frame(groupID, index_ad)

I want to add another column with a sequence for each group starting at the row where index_ad = 1 and then adding sequential positive/negative numbers depending on whether the row comes before or after the row where index_ad = 1.

ep_id <- c(0, 1, 2, 3, -2, -1, 0, 1, 2, -1, 0, 1, 2)
df1 <- data.frame(groupID, index_ad, ep_id)

I've tried using row_number, but that always starts from the first row in each group.

df <- df %>% group_by(groupID) %>% mutate(ep_num = row_number()) %>% ungroup()

The real dataset has >10,000 rows and multiple other variables including date/times. The groups are arranged/sorted by date/time and the 'index_ad' variable refers to whether the case/row should be considered the index case for that group. All cases/rows before the index case have date/times that occurred before it and all cases/rows after it have date/times that occurred after it.

Please help me figure out how to add the 'ep_id' numeric sequence using R! Thankyou!

CodePudding user response：

You can try

library(dplyr)

df |> group_by(groupID) |> mutate(ep_id = 1:n() - which(index_ad == 1))

output

# A tibble: 13 × 3
# Groups:   groupID [3]
   groupID index_ad ep_id
     <dbl>    <dbl> <int>
 1       1        1     0
 2       1        0     1
 3       1        0     2
 4       1        0     3
 5       2        0    -2
 6       2        0    -1
 7       2        1     0
 8       2        0     1
 9       2        0     2
10       3        0    -1
11       3        1     0
12       3        0     1
13       3        0     2

CodePudding user response：

df %>%
  group_by(groupID) %>%
  mutate(row  = row_number(),
         ep_num = row - row[index_ad == 1]) %>%
  ungroup()

# A tibble: 13 × 4
   groupID index_ad   row ep_num
     <dbl>    <dbl> <int>  <int>
 1       1        1     1      0
 2       1        0     2      1
 3       1        0     3      2
 4       1        0     4      3
 5       2        0     1     -2
 6       2        0     2     -1
 7       2        1     3      0
 8       2        0     4      1
 9       2        0     5      2
10       3        0     1     -1
11       3        1     2      0
12       3        0     3      1
13       3        0     4      2

CodePudding user response：

Here is a way. Subtract which index row is equal to 1 from the row number to get the result.

groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
index_ad <- c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0)

df <- data.frame(groupID, index_ad)

suppressPackageStartupMessages(library(dplyr))

df %>%
  group_by(groupID) %>%
  mutate(ep_num = row_number(),
         ep_num = ep_num - which(index_ad == 1)) %>%
  ungroup()
#> # A tibble: 13 × 3
#>    groupID index_ad ep_num
#>      <dbl>    <dbl>  <int>
#>  1       1        1      0
#>  2       1        0      1
#>  3       1        0      2
#>  4       1        0      3
#>  5       2        0     -2
#>  6       2        0     -1
#>  7       2        1      0
#>  8       2        0      1
#>  9       2        0      2
#> 10       3        0     -1
#> 11       3        1      0
#> 12       3        0      1
#> 13       3        0      2

^{Created on 2022-08-12 by the reprex package (v2.0.1)}

I have coded the mutate above in two lines to make it clearer but it can be simplified to

df %>%
  group_by(groupID) %>%
  mutate(ep_num = row_number() - which(index_ad == 1)) %>%
  ungroup()