I have a wide dataset with ~300 columns for ~150 biomarkers - each biomarker has a baseline value and a final value. I need to create a change variable (final - baseline) for each biomarker. I have been doing this manually when I have 10 variables, but I'm hoping for a more automated solution. This is what I'm currently doing:
df$marker1_chg = df$marker1_final - df$marker1_bl
df$marker2_chg = df$marker2_final - df$marker2_bl
And here is an example dataset:
data <- data.frame(c(1, 2, 3, 4, 5),
c(2, 2.3, 3.4, 3.6, 4.1),
c(3.1, 4.3, 4.4, 2.3, 4.6),
c(2.2, 5.4, 6.9, 3.9, 2.1),
c(3.5, 4.1, 6.7, 5.4, 3.8))
colnames(data) <- c('id', 'marker1_bl', 'marker1_final', 'marker2_bl', 'marker2_final')
CodePudding user response:
Try this
lst <- list()
for(i in 1:150){
lst[[i]] <- data[[paste0("marker" , i , "_final")]] - data[[paste0("marker" , i , "_bl")]]
}
names(lst) <- paste0("marker" , 1:150 , "_ch")
ans <- data.frame(lst)
CodePudding user response:
Another possible solution:
library(tidyverse)
map2_dfc(select(data, ends_with("_final")) %>% relocate(order(names(.))),
select(data, ends_with("_bl")) %>% relocate(order(names(.))),
~ .x - .y) %>%
set_names(str_replace(names(.), "final", "chg")) %>%
bind_cols(data, .)
#> id marker1_bl marker1_final marker2_bl marker2_final marker1_chg marker2_chg
#> 1 1 2.0 3.1 2.2 3.5 1.1 1.3
#> 2 2 2.3 4.3 5.4 4.1 2.0 -1.3
#> 3 3 3.4 4.4 6.9 6.7 1.0 -0.2
#> 4 4 3.6 2.3 3.9 5.4 -1.3 1.5
#> 5 5 4.1 4.6 2.1 3.8 0.5 1.7