Home > Enterprise >  Sequentially Number Group within Group
Sequentially Number Group within Group

Time:06-09

I would like to order by groups within a group. I need them numbered sequentially from 1:whatever, but need it done by group (ie so Age would be 1, BMI would be 2, and so on and so forth). I can't just assign Age a 1, because not all of my variables have Age and I need to have 1:whatever.

Here is the top of my dataframe:


> print(dft, n = 50)             
 1 33044756 Age                
 2 33044756 Age                
 3 33044756 Age                
 4 33044756 Age                
 5 33044756 Age                
 6 33044756 Age                
 7 33044756 BMI                
 8 33044756 BMI                
 9 33044756 BMI                
10 33044756 BMI                
11 34297066 Age                
12 34297066 Age                
13 34297066 Age                
14 34297066 Age                
15 34297066 BMI                
16 34297066 BMI                
17 34297066 BMI                
18 34297066 BMI                
19 34297066 Duration of disease
20 34297066 Duration of disease
21 34498056 Age                
22 34498056 Age                
23 34498056 Age                
24 34498056 Age                
25 34498056 Age                
26 34498056 Age                
27 34498056 BMI                
28 34498056 BMI                
29 34498056 BMI                
30 34498056 BMI                
31 34748096 Age                
32 34748096 Age                
33 34748096 Age                
34 34748096 Age                
35 34748096 BMI                
36 34748096 BMI                
37 34748096 BMI                
38 34748096 BMI                
39 34748096 Duration of disease
40 34748096 Duration of disease
41 34815219 BMI                
42 34815219 BMI                
43 34815219 BMI                
44 34815219 BMI                
45 34815219 PtGA               
46 34815219 PtGA               
47 34815219 PtGA               
48 34815219 PtGA               
49 34815219 SF36               
50 34815219 SF36     

The cur_group_id() function seemed like the best way, but it numbers ACROSS groups instead of within them. I need the numbering to restart everytime:

 dft %>% group_by(pmid, item) %>% mutate(id = cur_group_id()) %>% print(n=50)
# A tibble: 220 x 3
# Groups:   pmid, item [57]
   pmid     item                   id
   <chr>    <chr>               <int>
 1 33044756 Age                     1
 2 33044756 Age                     1
 3 33044756 Age                     1
 4 33044756 Age                     1
 5 33044756 Age                     1
 6 33044756 Age                     1
 7 33044756 BMI                     2
 8 33044756 BMI                     2
 9 33044756 BMI                     2
10 33044756 BMI                     2
11 34297066 Age                     3
12 34297066 Age                     3
13 34297066 Age                     3
14 34297066 Age                     3
15 34297066 BMI                     4
16 34297066 BMI                     4
17 34297066 BMI                     4
18 34297066 BMI                     4
19 34297066 Duration of disease     5
20 34297066 Duration of disease     5
21 34498056 Age                     6
22 34498056 Age                     6
23 34498056 Age                     6
24 34498056 Age                     6
25 34498056 Age                     6
26 34498056 Age                     6
27 34498056 BMI                     7
28 34498056 BMI                     7
29 34498056 BMI                     7
30 34498056 BMI                     7
31 34748096 Age                     8
32 34748096 Age                     8
33 34748096 Age                     8
34 34748096 Age                     8
35 34748096 BMI                     9
36 34748096 BMI                     9
37 34748096 BMI                     9
38 34748096 BMI                     9
39 34748096 Duration of disease    10
40 34748096 Duration of disease    10
41 34815219 BMI                    11
42 34815219 BMI                    11
43 34815219 BMI                    11
44 34815219 BMI                    11
45 34815219 PtGA                   12
46 34815219 PtGA                   12
47 34815219 PtGA                   12
48 34815219 PtGA                   12
49 34815219 SF36                   13
50 34815219 SF36                   13
# ... with 170 more rows
> 

Using row_number() would also be an option, but it re-starts the numbering WITHIN each subgroup (as opposed to giving each subgroup the same number):

dft %>% group_by(pmid, item) %>% mutate(id = row_number()) %>% print(n=50)
# A tibble: 220 x 3
# Groups:   pmid, item [57]
   pmid     item                   id
   <chr>    <chr>               <int>
 1 33044756 Age                     1
 2 33044756 Age                     2
 3 33044756 Age                     3
 4 33044756 Age                     4
 5 33044756 Age                     5
 6 33044756 Age                     6
 7 33044756 BMI                     1
 8 33044756 BMI                     2
 9 33044756 BMI                     3
10 33044756 BMI                     4
11 34297066 Age                     1
12 34297066 Age                     2
13 34297066 Age                     3
14 34297066 Age                     4
15 34297066 BMI                     1
16 34297066 BMI                     2
17 34297066 BMI                     3
18 34297066 BMI                     4
19 34297066 Duration of disease     1
20 34297066 Duration of disease     2
21 34498056 Age                     1
22 34498056 Age                     2
23 34498056 Age                     3
24 34498056 Age                     4
25 34498056 Age                     5
26 34498056 Age                     6
27 34498056 BMI                     1
28 34498056 BMI                     2
29 34498056 BMI                     3
30 34498056 BMI                     4
31 34748096 Age                     1
32 34748096 Age                     2
33 34748096 Age                     3
34 34748096 Age                     4
35 34748096 BMI                     1
36 34748096 BMI                     2
37 34748096 BMI                     3
38 34748096 BMI                     4
39 34748096 Duration of disease     1
40 34748096 Duration of disease     2
41 34815219 BMI                     1
42 34815219 BMI                     2
43 34815219 BMI                     3
44 34815219 BMI                     4
45 34815219 PtGA                    1
46 34815219 PtGA                    2
47 34815219 PtGA                    3
48 34815219 PtGA                    4
49 34815219 SF36                    1
50 34815219 SF36                    2
# ... with 170 more rows
> 

Here is the output I would want:


 1 33044756 Age                     1
 2 33044756 Age                     1
 3 33044756 Age                     1
 4 33044756 Age                     1
 5 33044756 Age                     1
 6 33044756 Age                     1
 7 33044756 BMI                     2
 8 33044756 BMI                     2
 9 33044756 BMI                     2
10 33044756 BMI                     2
11 34297066 Age                     1
12 34297066 Age                     1
13 34297066 Age                     1
14 34297066 Age                     1
15 34297066 BMI                     2
16 34297066 BMI                     2
17 34297066 BMI                     2
18 34297066 BMI                     2
19 34297066 Duration of disease     3
20 34297066 Duration of disease     3
21 34498056 Age                     1
22 34498056 Age                     1
23 34498056 Age                     1
24 34498056 Age                     1
25 34498056 Age                     1
26 34498056 Age                     1
27 34498056 BMI                     2
28 34498056 BMI                     2
29 34498056 BMI                     2
30 34498056 BMI                     2
31 34748096 Age                     1
32 34748096 Age                     1
33 34748096 Age                     1
34 34748096 Age                     1
35 34748096 BMI                     2
36 34748096 BMI                     2
37 34748096 BMI                     2
38 34748096 BMI                     2
39 34748096 Duration of disease     3
40 34748096 Duration of disease     3
41 34815219 BMI                     1
42 34815219 BMI                     1
43 34815219 BMI                     1
44 34815219 BMI                     1
45 34815219 PtGA                    2
46 34815219 PtGA                    2
47 34815219 PtGA                    2
48 34815219 PtGA                    2
49 34815219 SF36                    3
50 34815219 SF36                    3

CodePudding user response:

One way would be to join your original data to a summarized version of the same data where you assign group id's for the variable in question. Here's an example on a standard data set:

left_join(mtcars, mtcars %>% group_by(gear) %>% summarize(id = cur_group_id()))

Or, using a version of data like yours:

dft <- data.frame(
  stringsAsFactors = FALSE,
              pmid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
              item = c("Age", "Age", "BMI", "BMI", "Age", "Age", "BMI", "Duration")
)

left_join(
  dft,
  dft %>%
    group_by(item) %>%
    summarize(id = cur_group_id()))

Result

Joining, by = "item"
  pmid     item id
1    1      Age  1
2    1      Age  1
3    1      BMI  2
4    1      BMI  2
5    2      Age  1
6    2      Age  1
7    2      BMI  2
8    2 Duration  3

Please note, for future questions, it will make it easier for others to help you if you can provide some example data as code we can load, as opposed to a printout of what the data looks like. It's often easiest to use the magical dput function, by running dput(dft) or dput(head(dft, 50)) and pasting the output into the body of your question.

  • Related