Home > Software engineering >  Repeat a set of ID's for every "n rows"
Repeat a set of ID's for every "n rows"

Time:03-05

I have this data set in R:

first_variable = rexp(100,100)
second_variable = rexp(100,100)
n_obs = 1:100

question_data = data.frame(n_obs, first_variable, second_variable)

I want to make this dataset so that:

  • The rows 1-10 has id:1,2,3,4,5,6,7,8,9,10
  • The rows 11-20 has id: 1,2,3,4,5,6,7,8,9,10
  • The rows 21-30 has id : 1,2,,3,4,5,6,7,8,9,10 etc

In other words, the id's 1-10 repeat for each sets of 10 rows.

I found this code that I thought would work:

# here, n = 10 (a set of n = 10 rows)
bloc_len <- 10

question_data$id <- 
  rep(seq(1, 1   nrow(question_data) %/% bloc_len), each = bloc_len, length.out = nrow(question_data))

But this is not working, and is making each set of 10 rows as the same ID:

 n_obs first_variable second_variable id
1     1    0.006223412    0.0258968583  1
2     2    0.004473815    0.0065543554  1
3     3    0.011745754    0.0005061101  1
4     4    0.005620351    0.0033549525  1
5     5    0.045860202    0.0132625822  1
6     6    0.002477348    0.0068517981  1

I would have wanted something like this:

 n_obs first_variable second_variable id
1      1   0.0062234115    0.0258968583  1
2      2   0.0044738150    0.0065543554  2
3      3   0.0117457544    0.0005061101  3
4      4   0.0056203508    0.0033549525  4
5      5   0.0458602019    0.0132625822  5
6      6   0.0024773478    0.0068517981  6
7      7   0.0049527013    0.0047461094  7
8      8   0.0058581805    0.0108604478  8
9      9   0.0041171801    0.0002445268  9
10    10   0.0090667287    0.0019289691  10
11    11   0.0039002449    0.0135441919  1
12    12   0.0064558661    0.0230979415  2
13    13   0.0104993267    0.0005609776  3
14    14   0.0153162705    0.0038364012  4
15    15   0.0107109676    0.0183818539  5
16    16   0.0131620151    0.0029710189  6
17    17   0.0244441763    0.0095645480  7
18    18   0.0058112355    0.0125754349  8
19    19   0.0005022588    0.0156614272  9
20    20   0.0007572985    0.0049964333  10
21    21   0.0276024376    0.0024303513  1

Is this possible?

Thank you!

CodePudding user response:

Instead of each, try using times:

question_data$id <- 
  rep(seq(bloc_len), times = nrow(question_data) %/% bloc_len, length.out = nrow(question_data))

CodePudding user response:

Like the example shared, if the number of rows in the data (100) is completely divisible by the number of id's (10) then we can use R's recycling property to repeat the id's.

bloc_len <- 10
question_data$id <- seq_len(bloc_len)

If they are not completely divisible we can use rep -

question_data$id <- rep(seq_len(bloc_len), length.out = nrow(question_data))
  • Related