I have a small dataset in which there are 4 data records about student academic progression. The fist column is id, and the second column is grade. My goal is to create a new variable called "GR" (Grade Retention) to identify the students with grade retention. If it is the case, then GR=1, otherwise, GR=0. The resulting dataset should look like below,
structure(list(id = c(1000, 1000, 1000, 1000, 1000, 1000, 1001,
1001, 1001, 1001, 1001, 1001, 1001, 1002, 1002, 1002, 1002, 1002,
1002, 1002, 1002, 1002, 1003, 1003, 1003, 1003, 1003, 1003),
grade = c("1", "2", "3", "4", "5", "6", "1", "2", "3", "4",
"5", "5", "6", "1", "2", "2", "3", "4", "4", "4", "5", "6",
"1", "2", "3", "4", "5", "6"), GR= c("0", "0",
"0", "0", "0", "0", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "0", "0", "0", "0",
"0", "0")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-28L))->df_new
The original data set is structured as follows,
structure(list(id = c(1000, 1000, 1000, 1000, 1000, 1000, 1001,
1001, 1001, 1001, 1001, 1001, 1001, 1002, 1002, 1002, 1002, 1002,
1002, 1002, 1002, 1002, 1003, 1003, 1003, 1003, 1003, 1003),
grade = c("1", "2", "3", "4", "5", "6", "1", "2", "3", "4",
"5", "5", "6", "1", "2", "2", "3", "4", "4", "4", "5", "6",
"1", "2", "3", "4", "5", "6")), row.names = c(NA, -28L), class = c("tbl_df",
"tbl", "data.frame"))->df
Appreciate your kindly help or guidance!
CodePudding user response:
You can check in each group if contains any
duplicated
grade
:
library(tidyverse)
df %>%
group_by(id) %>%
mutate(GR = any(duplicated(grade)))
Output
id grade GR
<dbl> <chr> <int>
1 1000 1 0
2 1000 2 0
3 1000 3 0
4 1000 4 0
5 1000 5 0
6 1000 6 0
7 1001 1 1
8 1001 2 1
9 1001 3 1
10 1001 4 1
# … with 18 more rows
CodePudding user response:
The question isn't posed in a friendly way if you want a Stata solution too, but here is one way to do it. The idea is just to get an indicator for any repeated grade and then to spread it to other observations (records, rows) for that identifier.
* Example generated by -dataex-. For more info, type help dataex
clear
input int id byte grade float GR
1000 1 0
1000 2 0
1000 3 0
1000 4 0
1000 5 0
1000 6 0
1001 1 1
1001 2 1
1001 3 1
1001 4 1
1001 5 1
1001 5 1
1001 6 1
1002 1 1
1002 2 1
1002 2 1
1002 3 1
1002 4 1
1002 4 1
1002 4 1
1002 5 1
1002 6 1
1003 1 0
1003 2 0
1003 3 0
1003 4 0
1003 5 0
1003 6 0
end
bysort id grade : gen wanted = _N > 1
bysort id (wanted) : replace wanted = wanted[_N]
sort id grade
list, sepby(id)
----------------------------
| id grade GR wanted |
|----------------------------|
1. | 1000 1 0 0 |
2. | 1000 2 0 0 |
3. | 1000 3 0 0 |
4. | 1000 4 0 0 |
5. | 1000 5 0 0 |
6. | 1000 6 0 0 |
|----------------------------|
7. | 1001 1 1 1 |
8. | 1001 2 1 1 |
9. | 1001 3 1 1 |
10. | 1001 4 1 1 |
11. | 1001 5 1 1 |
12. | 1001 5 1 1 |
13. | 1001 6 1 1 |
|----------------------------|
14. | 1002 1 1 1 |
15. | 1002 2 1 1 |
16. | 1002 2 1 1 |
17. | 1002 3 1 1 |
18. | 1002 4 1 1 |
19. | 1002 4 1 1 |
20. | 1002 4 1 1 |
21. | 1002 5 1 1 |
22. | 1002 6 1 1 |
|----------------------------|
23. | 1003 1 0 0 |
24. | 1003 2 0 0 |
25. | 1003 3 0 0 |
26. | 1003 4 0 0 |
27. | 1003 5 0 0 |
28. | 1003 6 0 0 |
----------------------------