Use replicate to create new variable-CodePudding

I have the following code:

Ni <- 133     # number of individuals 
MXmeas <- 10   # number of measurements

# simulate number of observations for each individual
Nmeas <- round(runif(Ni, 1, MXmeas))
 
# simulate observation moments (under the assumption that everybody has at least one observation)
obs <- unlist(sapply(Nmeas, function(x) c(1, sort(sample(2:MXmeas, x-1, replace = FALSE)))))
 
# set up dataframe (id, observations)
dat <- data.frame(ID = rep(1:Ni, times = Nmeas), observations = obs)

This results in the following output:

ID observations
1             1
1             3
1             4
1             5
1             6
1             8

However, I also want a variable 'times' to indicate how many times of measurement there were for each individual. But since every ID has a different length, I am not sure how to implement this. This anybody know how to include that? I want it to look like this:

ID observations times
1             1     1
1             3     2
1             4     3
1             5     4
1             6     5
1             8     6

CodePudding user response：

Using dplyr you could group by ID and use the row number for times:

library(dplyr)

dat |>
  group_by(ID) |>
  mutate(times = row_number()) |>
  ungroup()

With base we could create the sequence based on each of the lengths of the ID variable:

dat$times <- sequence(rle(dat$ID)$lengths)

Output:

# A tibble: 734 × 3
      ID observations times
   <int>        <dbl> <int>
 1     1            1     1
 2     1            3     2
 3     1            9     3
 4     2            1     1
 5     2            5     2
 6     2            6     3
 7     2            8     4
 8     3            1     1
 9     3            2     2
10     3            5     3
# … with 724 more rows

Data (using a seed):

set.seed(1)
Ni <- 133     # number of individuals 
MXmeas <- 10   # number of measurements

# simulate number of observations for each individual
Nmeas <- round(runif(Ni, 1, MXmeas))

# simulate observation moments (under the assumption that everybody has at least one observation)
obs <- unlist(sapply(Nmeas, function(x) c(1, sort(sample(2:MXmeas, x-1, replace = FALSE)))))

# set up dataframe (id, observations)
dat <- data.frame(ID = rep(1:Ni, times = Nmeas), observations = obs)