An example dataframe with 2 columns:
groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
index_ad <- c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0)
df <- data.frame(groupID, index_ad)
I want to add another column with a sequence for each group starting at the row where index_ad = 1 and then adding sequential positive/negative numbers depending on whether the row comes before or after the row where index_ad = 1.
ep_id <- c(0, 1, 2, 3, -2, -1, 0, 1, 2, -1, 0, 1, 2)
df1 <- data.frame(groupID, index_ad, ep_id)
I've tried using row_number, but that always starts from the first row in each group.
df <- df %>% group_by(groupID) %>% mutate(ep_num = row_number()) %>% ungroup()
The real dataset has >10,000 rows and multiple other variables including date/times. The groups are arranged/sorted by date/time and the 'index_ad' variable refers to whether the case/row should be considered the index case for that group. All cases/rows before the index case have date/times that occurred before it and all cases/rows after it have date/times that occurred after it.
Please help me figure out how to add the 'ep_id' numeric sequence using R! Thankyou!
CodePudding user response:
- You can try
library(dplyr)
df |> group_by(groupID) |> mutate(ep_id = 1:n() - which(index_ad == 1))
- output
# A tibble: 13 × 3
# Groups: groupID [3]
groupID index_ad ep_id
<dbl> <dbl> <int>
1 1 1 0
2 1 0 1
3 1 0 2
4 1 0 3
5 2 0 -2
6 2 0 -1
7 2 1 0
8 2 0 1
9 2 0 2
10 3 0 -1
11 3 1 0
12 3 0 1
13 3 0 2
CodePudding user response:
df %>%
group_by(groupID) %>%
mutate(row = row_number(),
ep_num = row - row[index_ad == 1]) %>%
ungroup()
# A tibble: 13 × 4
groupID index_ad row ep_num
<dbl> <dbl> <int> <int>
1 1 1 1 0
2 1 0 2 1
3 1 0 3 2
4 1 0 4 3
5 2 0 1 -2
6 2 0 2 -1
7 2 1 3 0
8 2 0 4 1
9 2 0 5 2
10 3 0 1 -1
11 3 1 2 0
12 3 0 3 1
13 3 0 4 2
CodePudding user response:
Here is a way. Subtract which
index row is equal to 1 from the row number to get the result.
groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
index_ad <- c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0)
df <- data.frame(groupID, index_ad)
suppressPackageStartupMessages(library(dplyr))
df %>%
group_by(groupID) %>%
mutate(ep_num = row_number(),
ep_num = ep_num - which(index_ad == 1)) %>%
ungroup()
#> # A tibble: 13 × 3
#> groupID index_ad ep_num
#> <dbl> <dbl> <int>
#> 1 1 1 0
#> 2 1 0 1
#> 3 1 0 2
#> 4 1 0 3
#> 5 2 0 -2
#> 6 2 0 -1
#> 7 2 1 0
#> 8 2 0 1
#> 9 2 0 2
#> 10 3 0 -1
#> 11 3 1 0
#> 12 3 0 1
#> 13 3 0 2
Created on 2022-08-12 by the reprex package (v2.0.1)
I have coded the mutate
above in two lines to make it clearer but it can be simplified to
df %>%
group_by(groupID) %>%
mutate(ep_num = row_number() - which(index_ad == 1)) %>%
ungroup()