I have a dataset in which each individual has 3 possible readings of systolic blood pressure (SBP) and 3 possible readings of diastolic blood pressure (DBP):
a = data.frame(
ID = c(1:10),
SBP1 = c(120, 121, 122, as.numeric(NA), 123, 124, 145, as.numeric(NA), 101, 110),
SBP2 = c(134, 124, as.numeric(NA), as.numeric(NA), 102, 133, 123, as.numeric(NA), as.numeric(NA), 109),
SBP3 = c(111, 123, as.numeric(NA), as.numeric(NA), as.numeric(NA), 133, 132, 111, 110, 123),
DBP1 = c(89, 90, 87, as.numeric(NA), 65, 98, 80, as.numeric(NA), 66, 65),
DBP2 = c(90, 92, as.numeric(NA), as.numeric(NA), 65, 78, 88, as.numeric(NA), as.numeric(NA), 91),
DBP3 = c(91, 93, as.numeric(NA), as.numeric(NA), as.numeric(NA), 92, 78, 88, 88, 54)
)
I would like to create two new variables (one for the SBP called 'SBP_new', and the other for the DBP called 'DBP_new') using the following rules:
- If all 3 of the SBP/DBP readings are complete, then calculate the median (e.g., for ID1, SBP_new = 120, DBP = 90)
- If two of the 3 SBP/DBP readings are present, then calculate the mean (e.g., for ID5, SBP_new = (123 102)/2 and DBP_new = (65 65)/2)
- If only 1 pair of SBP/DBP reading available, then take that pair (e.g., for ID3, SBP_new = 122, DBP_new = 87)
- Finally, if all NA, then assign NA (e.g., for ID4, SBP_new = NA, DBP_new = NA)
I can subset my dataset into 4 subsets and then do the calculation in each individually then combine.
But is there a more efficient way to do this?
CodePudding user response:
Like @Ritchie Sacramento says in his comment to the question, compute the median for all cases. But remove NA
's depending on whether or not all values are NA
.
i_sbp <- grep("SBP", names(a))
i_dbp <- grep("DBP", names(a))
a$SBP_new <- apply(a[i_sbp], 1, \(x) median(x, na.rm = any(!is.na(x))))
a$DBP_new <- apply(a[i_dbp], 1, \(x) median(x, na.rm = any(!is.na(x))))
Created on 2022-05-29 by the reprex package (v2.0.1)