Home > Software design >  Recode Several Columns Based on Observation
Recode Several Columns Based on Observation

Time:12-30

I am using R to recode several columns. Specifically, I have a data frame in which I would like to recode several columns based on whether they are the same value (1) or not (0) based on a specific observation/observation ID value. For example, I have data that looks like the following:

ID  Control1    Control2    Var1    Var2    Var3
1    3            3          7       2       2
2    4            3          4       2       0
3    2            4          7       NA      9
4    2            2          8       4       0
5    2            2         NA       9       2
6    3            3          7       2       0

I would like for it to be recoded as the following:

ID  Control1    Control2    Var1    Var2    Var3
1     3            3         1       1       0
2     4            3         0       1       1
3     2            4         1       NA      0  
4     2            2         0       0       1
5     2            2         NA      0       0

Essentially, I would like for the selected columns (Var1, Var2, Var3) to be recoded based on whether they match the value for ID #6 in that specific column. For missing data, I use “NA” and I would like to keep it as “NA.” Further, I would like to keep the ID, Control1, and Control2 columns as is. Finally, the data for ID # 6 can be dropped in the end.

I can do this with an individual column; however, the data that I am recoding has around a thousand columns and it would take a long time to do this for each individual column. This seems as if it should be fairly simple. I appreciate any help.

Code to create sample dataframe:

DF <- structure(list(ID = c(1, 2, 3, 4, 5, 6), Control1 = c(3, 4, 2, 2, 2, 3), 
Control2 = c(3, 3,4,2,2,3), Var1 = c(7, 4, 7,8,NA, 7), 
Var2 = c(2,2, NA, 4, 9,2), Var3 = c(2, 0,9,0,2,0)), class = "data.frame", row.names = c(NA, -6L))

CodePudding user response:

First bring your columns into correct class. We can do this with type.convert (this is necessary, otherwise you will get problems by defining case_when). So said define your conditions using the last() function and finally slice remove last row:

library(dplyr)

DF %>% 
  type.convert(as.is = TRUE) %>% 
  mutate(across(-c(1:3), ~case_when(. == last(.) ~ 1,
                                    . != last(.) ~ 0, 
                                    TRUE ~ NA_real_))) %>% 
  slice(1:(n()-1))
  ID Control1 Control2 Var1 Var2 Var3
1  1        3        3    1    1    0
2  2        4        3    0    1    1
3  3        2        4    1   NA    0
4  4        2        2    0    0    1
5  5        2        2   NA    0    0
  • Related