I would like to order by groups within a group. I need them numbered sequentially from 1:whatever, but need it done by group (ie so Age would be 1, BMI would be 2, and so on and so forth). I can't just assign Age a 1, because not all of my variables have Age and I need to have 1:whatever.
Here is the top of my dataframe:
> print(dft, n = 50)
1 33044756 Age
2 33044756 Age
3 33044756 Age
4 33044756 Age
5 33044756 Age
6 33044756 Age
7 33044756 BMI
8 33044756 BMI
9 33044756 BMI
10 33044756 BMI
11 34297066 Age
12 34297066 Age
13 34297066 Age
14 34297066 Age
15 34297066 BMI
16 34297066 BMI
17 34297066 BMI
18 34297066 BMI
19 34297066 Duration of disease
20 34297066 Duration of disease
21 34498056 Age
22 34498056 Age
23 34498056 Age
24 34498056 Age
25 34498056 Age
26 34498056 Age
27 34498056 BMI
28 34498056 BMI
29 34498056 BMI
30 34498056 BMI
31 34748096 Age
32 34748096 Age
33 34748096 Age
34 34748096 Age
35 34748096 BMI
36 34748096 BMI
37 34748096 BMI
38 34748096 BMI
39 34748096 Duration of disease
40 34748096 Duration of disease
41 34815219 BMI
42 34815219 BMI
43 34815219 BMI
44 34815219 BMI
45 34815219 PtGA
46 34815219 PtGA
47 34815219 PtGA
48 34815219 PtGA
49 34815219 SF36
50 34815219 SF36
The cur_group_id() function seemed like the best way, but it numbers ACROSS groups instead of within them. I need the numbering to restart everytime:
dft %>% group_by(pmid, item) %>% mutate(id = cur_group_id()) %>% print(n=50)
# A tibble: 220 x 3
# Groups: pmid, item [57]
pmid item id
<chr> <chr> <int>
1 33044756 Age 1
2 33044756 Age 1
3 33044756 Age 1
4 33044756 Age 1
5 33044756 Age 1
6 33044756 Age 1
7 33044756 BMI 2
8 33044756 BMI 2
9 33044756 BMI 2
10 33044756 BMI 2
11 34297066 Age 3
12 34297066 Age 3
13 34297066 Age 3
14 34297066 Age 3
15 34297066 BMI 4
16 34297066 BMI 4
17 34297066 BMI 4
18 34297066 BMI 4
19 34297066 Duration of disease 5
20 34297066 Duration of disease 5
21 34498056 Age 6
22 34498056 Age 6
23 34498056 Age 6
24 34498056 Age 6
25 34498056 Age 6
26 34498056 Age 6
27 34498056 BMI 7
28 34498056 BMI 7
29 34498056 BMI 7
30 34498056 BMI 7
31 34748096 Age 8
32 34748096 Age 8
33 34748096 Age 8
34 34748096 Age 8
35 34748096 BMI 9
36 34748096 BMI 9
37 34748096 BMI 9
38 34748096 BMI 9
39 34748096 Duration of disease 10
40 34748096 Duration of disease 10
41 34815219 BMI 11
42 34815219 BMI 11
43 34815219 BMI 11
44 34815219 BMI 11
45 34815219 PtGA 12
46 34815219 PtGA 12
47 34815219 PtGA 12
48 34815219 PtGA 12
49 34815219 SF36 13
50 34815219 SF36 13
# ... with 170 more rows
>
Using row_number() would also be an option, but it re-starts the numbering WITHIN each subgroup (as opposed to giving each subgroup the same number):
dft %>% group_by(pmid, item) %>% mutate(id = row_number()) %>% print(n=50)
# A tibble: 220 x 3
# Groups: pmid, item [57]
pmid item id
<chr> <chr> <int>
1 33044756 Age 1
2 33044756 Age 2
3 33044756 Age 3
4 33044756 Age 4
5 33044756 Age 5
6 33044756 Age 6
7 33044756 BMI 1
8 33044756 BMI 2
9 33044756 BMI 3
10 33044756 BMI 4
11 34297066 Age 1
12 34297066 Age 2
13 34297066 Age 3
14 34297066 Age 4
15 34297066 BMI 1
16 34297066 BMI 2
17 34297066 BMI 3
18 34297066 BMI 4
19 34297066 Duration of disease 1
20 34297066 Duration of disease 2
21 34498056 Age 1
22 34498056 Age 2
23 34498056 Age 3
24 34498056 Age 4
25 34498056 Age 5
26 34498056 Age 6
27 34498056 BMI 1
28 34498056 BMI 2
29 34498056 BMI 3
30 34498056 BMI 4
31 34748096 Age 1
32 34748096 Age 2
33 34748096 Age 3
34 34748096 Age 4
35 34748096 BMI 1
36 34748096 BMI 2
37 34748096 BMI 3
38 34748096 BMI 4
39 34748096 Duration of disease 1
40 34748096 Duration of disease 2
41 34815219 BMI 1
42 34815219 BMI 2
43 34815219 BMI 3
44 34815219 BMI 4
45 34815219 PtGA 1
46 34815219 PtGA 2
47 34815219 PtGA 3
48 34815219 PtGA 4
49 34815219 SF36 1
50 34815219 SF36 2
# ... with 170 more rows
>
Here is the output I would want:
1 33044756 Age 1
2 33044756 Age 1
3 33044756 Age 1
4 33044756 Age 1
5 33044756 Age 1
6 33044756 Age 1
7 33044756 BMI 2
8 33044756 BMI 2
9 33044756 BMI 2
10 33044756 BMI 2
11 34297066 Age 1
12 34297066 Age 1
13 34297066 Age 1
14 34297066 Age 1
15 34297066 BMI 2
16 34297066 BMI 2
17 34297066 BMI 2
18 34297066 BMI 2
19 34297066 Duration of disease 3
20 34297066 Duration of disease 3
21 34498056 Age 1
22 34498056 Age 1
23 34498056 Age 1
24 34498056 Age 1
25 34498056 Age 1
26 34498056 Age 1
27 34498056 BMI 2
28 34498056 BMI 2
29 34498056 BMI 2
30 34498056 BMI 2
31 34748096 Age 1
32 34748096 Age 1
33 34748096 Age 1
34 34748096 Age 1
35 34748096 BMI 2
36 34748096 BMI 2
37 34748096 BMI 2
38 34748096 BMI 2
39 34748096 Duration of disease 3
40 34748096 Duration of disease 3
41 34815219 BMI 1
42 34815219 BMI 1
43 34815219 BMI 1
44 34815219 BMI 1
45 34815219 PtGA 2
46 34815219 PtGA 2
47 34815219 PtGA 2
48 34815219 PtGA 2
49 34815219 SF36 3
50 34815219 SF36 3
CodePudding user response:
One way would be to join your original data to a summarized version of the same data where you assign group id's for the variable in question. Here's an example on a standard data set:
left_join(mtcars, mtcars %>% group_by(gear) %>% summarize(id = cur_group_id()))
Or, using a version of data like yours:
dft <- data.frame(
stringsAsFactors = FALSE,
pmid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
item = c("Age", "Age", "BMI", "BMI", "Age", "Age", "BMI", "Duration")
)
left_join(
dft,
dft %>%
group_by(item) %>%
summarize(id = cur_group_id()))
Result
Joining, by = "item"
pmid item id
1 1 Age 1
2 1 Age 1
3 1 BMI 2
4 1 BMI 2
5 2 Age 1
6 2 Age 1
7 2 BMI 2
8 2 Duration 3
Please note, for future questions, it will make it easier for others to help you if you can provide some example data as code we can load, as opposed to a printout of what the data looks like. It's often easiest to use the magical dput
function, by running dput(dft)
or dput(head(dft, 50))
and pasting the output into the body of your question.