I am using R to recode several columns. Specifically, I have a data frame in which I would like to recode several columns based on whether they are the same value (1) or not (0) based on a specific observation/observation ID value. For example, I have data that looks like the following:
ID Control1 Control2 Var1 Var2 Var3
1 3 3 7 2 2
2 4 3 4 2 0
3 2 4 7 NA 9
4 2 2 8 4 0
5 2 2 NA 9 2
6 3 3 7 2 0
I would like for it to be recoded as the following:
ID Control1 Control2 Var1 Var2 Var3
1 3 3 1 1 0
2 4 3 0 1 1
3 2 4 1 NA 0
4 2 2 0 0 1
5 2 2 NA 0 0
Essentially, I would like for the selected columns (Var1, Var2, Var3) to be recoded based on whether they match the value for ID #6 in that specific column. For missing data, I use “NA” and I would like to keep it as “NA.” Further, I would like to keep the ID, Control1, and Control2 columns as is. Finally, the data for ID # 6 can be dropped in the end.
I can do this with an individual column; however, the data that I am recoding has around a thousand columns and it would take a long time to do this for each individual column. This seems as if it should be fairly simple. I appreciate any help.
Code to create sample dataframe:
DF <- structure(list(ID = c(1, 2, 3, 4, 5, 6), Control1 = c(3, 4, 2, 2, 2, 3),
Control2 = c(3, 3,4,2,2,3), Var1 = c(7, 4, 7,8,NA, 7),
Var2 = c(2,2, NA, 4, 9,2), Var3 = c(2, 0,9,0,2,0)), class = "data.frame", row.names = c(NA, -6L))
CodePudding user response:
First bring your columns into correct class. We can do this with type.convert
(this is necessary, otherwise you will get problems by defining case_when). So said define your conditions using the last()
function and finally slice remove last row:
library(dplyr)
DF %>%
type.convert(as.is = TRUE) %>%
mutate(across(-c(1:3), ~case_when(. == last(.) ~ 1,
. != last(.) ~ 0,
TRUE ~ NA_real_))) %>%
slice(1:(n()-1))
ID Control1 Control2 Var1 Var2 Var3
1 1 3 3 1 1 0
2 2 4 3 0 1 1
3 3 2 4 1 NA 0
4 4 2 2 0 0 1
5 5 2 2 NA 0 0