I need a recoding help. Here how my dataset looks like.
df <- data.frame(id = c(1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3, 4,4,4,4,4),
score = c(0,1,0,1,0, 0,2,0,2,2, 0,3,3,0,0, 0,1,3,1,3))
> df
id score
1 1 0
2 1 1
3 1 0
4 1 1
5 1 0
6 2 0
7 2 2
8 2 0
9 2 2
10 2 2
11 3 0
12 3 3
13 3 3
14 3 0
15 3 0
16 4 0
17 4 1
18 4 3
19 4 1
20 4 3
Some id
s have missing score categories. So if this is the case per id
, I would like to recode score category. So:
a) if the score options are `0,1,2` and `1` score is missing, then `2` need to be recoded as `1`,
b) if the score options are `0,1,2,3` and `1,2` score is missing, then `3` need to be recoded as `1`,
c) if the score options are `0,1,2,3` and `2` score is missing, then `2,3` need to be recoded as `1,2`,
the idea is there should not be any missing score
categories in between.
The desired output would be:
> df.1
id score score.recoded
1 1 0 0
2 1 1 1
3 1 0 0
4 1 1 1
5 1 0 0
6 2 0 0
7 2 2 1
8 2 0 0
9 2 2 1
10 2 2 1
11 3 0 0
12 3 3 1
13 3 3 1
14 3 0 0
15 3 0 0
16 4 0 0
17 4 1 1
18 4 3 2
19 4 1 1
20 4 3 2
CodePudding user response:
df %>%
group_by(id)%>%
mutate(score = as.numeric(factor(score)) - 1)
# A tibble: 20 x 2
# Groups: id [4]
id score
<dbl> <dbl>
1 1 0
2 1 1
3 1 0
4 1 1
5 1 0
6 2 0
7 2 1
8 2 0
9 2 1
10 2 1
11 3 0
12 3 1
13 3 1
14 3 0
15 3 0
16 4 0
17 4 1
18 4 2
19 4 1
20 4 2
CodePudding user response:
Using data.table
library(data.table)
setDT(df)[, score.recoded := 0][
score >0, score.recoded := match(score, score), id]
-output
> df
id score score.recoded
<num> <num> <int>
1: 1 0 0
2: 1 1 1
3: 1 0 0
4: 1 1 1
5: 1 0 0
6: 2 0 0
7: 2 2 1
8: 2 0 0
9: 2 2 1
10: 2 2 1
11: 3 0 0
12: 3 3 1
13: 3 3 1
14: 3 0 0
15: 3 0 0
16: 4 0 0
17: 4 1 1
18: 4 3 2
19: 4 1 1
20: 4 3 2