I am trying to create consecutive ID numbers for each distinct study. I found an example of data where they managed to create such an ID number under esid
variable
Browse[1]> dat <- dat.assink2016
Browse[1]> head(dat, 9)
study esid id yi vi pubstatus year deltype
1 1 1 1 0.9066 0.0740 1 4.5 general
2 1 2 2 0.4295 0.0398 1 4.5 general
3 1 3 3 0.2679 0.0481 1 4.5 general
4 1 4 4 0.2078 0.0239 1 4.5 general
5 1 5 5 0.0526 0.0331 1 4.5 general
6 1 6 6 -0.0507 0.0886 1 4.5 general
7 2 1 7 0.5117 0.0115 1 1.5 general
8 2 2 8 0.4738 0.0076 1 1.5 general
9 2 3 9 0.3544 0.0065 1 1.5 general
I would like to create the same for my study, can anyone show me how to do it?
CodePudding user response:
If the id column is consecutive (i.e. no jumps or repeated values) you could subtract the minimum value of id for each study and add one:
# Example data
df = data.frame(study=c(1,1,1,2,2,2,2,3,3),
id=1:9)
# Calculate minima
min.id = tapply(X=df$id,
INDEX=df$study,
FUN=min)
# merge this with the data
df$min.id = min.id[df$study]
# Calculate consecutive id as required
df$esid = df$id - df$min.id 1
CodePudding user response:
The key is to group_by
id, then use row_number
library(dplyr)
df %>%
group_by(study) %>%
mutate(esid = row_number())
with the example data from @njp:
# A tibble: 9 × 3
# Groups: study [3]
study id esid
<dbl> <int> <int>
1 1 1 1
2 1 2 2
3 1 3 3
4 2 4 1
5 2 5 2
6 2 6 3
7 2 7 4
8 3 8 1
9 3 9 2