How to give a consecutive id number for each distinct study in r-CodePudding

I am trying to create consecutive ID numbers for each distinct study. I found an example of data where they managed to create such an ID number under esid variable

Browse[1]> dat <- dat.assink2016
Browse[1]> head(dat, 9)
  study esid id      yi     vi pubstatus year deltype
1     1    1  1  0.9066 0.0740         1  4.5 general
2     1    2  2  0.4295 0.0398         1  4.5 general
3     1    3  3  0.2679 0.0481         1  4.5 general
4     1    4  4  0.2078 0.0239         1  4.5 general
5     1    5  5  0.0526 0.0331         1  4.5 general
6     1    6  6 -0.0507 0.0886         1  4.5 general
7     2    1  7  0.5117 0.0115         1  1.5 general
8     2    2  8  0.4738 0.0076         1  1.5 general
9     2    3  9  0.3544 0.0065         1  1.5 general

I would like to create the same for my study, can anyone show me how to do it?

CodePudding user response：

If the id column is consecutive (i.e. no jumps or repeated values) you could subtract the minimum value of id for each study and add one:

# Example data
df = data.frame(study=c(1,1,1,2,2,2,2,3,3),
                id=1:9)

# Calculate minima
min.id = tapply(X=df$id,
                INDEX=df$study,
                FUN=min)

# merge this with the data
df$min.id = min.id[df$study]

# Calculate consecutive id as required
df$esid = df$id - df$min.id 1

CodePudding user response：

The key is to group_by id, then use row_number

library(dplyr)

df %>% 
    group_by(study) %>%
    mutate(esid = row_number())

with the example data from @njp:

# A tibble: 9 × 3
# Groups:   study [3]
  study    id  esid
  <dbl> <int> <int>
1     1     1     1
2     1     2     2
3     1     3     3
4     2     4     1
5     2     5     2
6     2     6     3
7     2     7     4
8     3     8     1
9     3     9     2