Home > database >  Recode when there is a missing category in R
Recode when there is a missing category in R

Time:07-19

I need a recoding help. Here how my dataset looks like.

df <- data.frame(id = c(1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3, 4,4,4,4,4),
             score = c(0,1,0,1,0, 0,2,0,2,2, 0,3,3,0,0, 0,1,3,1,3))

    > df
   id score
1   1     0
2   1     1
3   1     0
4   1     1
5   1     0
6   2     0
7   2     2
8   2     0
9   2     2
10  2     2
11  3     0
12  3     3
13  3     3
14  3     0
15  3     0
16  4     0
17  4     1
18  4     3
19  4     1
20  4     3

Some ids have missing score categories. So if this is the case per id, I would like to recode score category. So:

a) if the score options are `0,1,2` and `1` score is missing, then `2` need to be recoded as `1`,
b) if the score options are `0,1,2,3` and `1,2` score is missing, then `3` need to be recoded as `1`,
c) if the score options are `0,1,2,3` and `2` score is missing, then `2,3` need to be recoded as `1,2`,

the idea is there should not be any missing score categories in between.

The desired output would be:

> df.1
   id score score.recoded
1   1     0             0
2   1     1             1
3   1     0             0
4   1     1             1
5   1     0             0
6   2     0             0
7   2     2             1
8   2     0             0
9   2     2             1
10  2     2             1
11  3     0             0
12  3     3             1
13  3     3             1
14  3     0             0
15  3     0             0
16  4     0             0
17  4     1             1
18  4     3             2
19  4     1             1
20  4     3             2

CodePudding user response:

df %>%
   group_by(id)%>%
   mutate(score = as.numeric(factor(score)) - 1)

# A tibble: 20 x 2
# Groups:   id [4]
      id score
   <dbl> <dbl>
 1     1     0
 2     1     1
 3     1     0
 4     1     1
 5     1     0
 6     2     0
 7     2     1
 8     2     0
 9     2     1
10     2     1
11     3     0
12     3     1
13     3     1
14     3     0
15     3     0
16     4     0
17     4     1
18     4     2
19     4     1
20     4     2

CodePudding user response:

Using data.table

library(data.table)
setDT(df)[, score.recoded := 0][
   score >0, score.recoded := match(score, score), id]

-output

> df
       id score score.recoded
    <num> <num>         <int>
 1:     1     0             0
 2:     1     1             1
 3:     1     0             0
 4:     1     1             1
 5:     1     0             0
 6:     2     0             0
 7:     2     2             1
 8:     2     0             0
 9:     2     2             1
10:     2     2             1
11:     3     0             0
12:     3     3             1
13:     3     3             1
14:     3     0             0
15:     3     0             0
16:     4     0             0
17:     4     1             1
18:     4     3             2
19:     4     1             1
20:     4     3             2
  • Related