I have some trial data where each participant is assessed for multiple scores multiple times, and it is in long format. For example:
set.seed(1)
data <- data.frame(patient = rep(c(100, 101, 102, 103), 3),
group = rep(c("A", "B", "A", "B"), 3),
time = rep(c(0, 1, 2), each = 4),
score_a = runif(12),
score_b = runif(12))
print(data, digits = 2)
patient group time score_a score_b
1 100 A 0 0.266 0.69
2 101 B 0 0.372 0.38
3 102 A 0 0.573 0.77
4 103 B 0 0.908 0.50
5 100 A 1 0.202 0.72
6 101 B 1 0.898 0.99
7 102 A 1 0.945 0.38
8 103 B 1 0.661 0.78
9 100 A 2 0.629 0.93
10 101 B 2 0.062 0.21
11 102 A 2 0.206 0.65
12 103 B 2 0.177 0.13
I would like to pivot this data in some way that I keep it in long format, but extract the baseline (time = 0) measurements as separate columns. So, the new columns would be:
patient group score_a_0 score_b_0 time score_a score_b
With the example dataset, after the weird pivot it would have 8 rows (time 1 and time 2, for 4 patients).
This way, at each timepoint I can adjust for baseline. I'm looking for a general solution, as the actual dataset has 7 scores and 6 followups. I'm assuming the solution will involve a combination of pivot_wider
and pivot_longer
but I can't make it work.
The ideal output would be:
patient group time score_a score_b score_a_0 score_b_0
1 100 A 1 0.20168193 0.7176185 0.2655087 0.6870228
2 101 B 1 0.89838969 0.9919061 0.3721239 0.3841037
3 102 A 1 0.94467527 0.3800352 0.5728534 0.7698414
4 103 B 1 0.66079779 0.7774452 0.9082078 0.4976992
5 100 A 2 0.62911404 0.9347052 0.2655087 0.6870228
6 101 B 2 0.06178627 0.2121425 0.3721239 0.3841037
7 102 A 2 0.20597457 0.6516738 0.5728534 0.7698414
8 103 B 2 0.17655675 0.1255551 0.9082078 0.4976992
CodePudding user response:
Grouped by 'patient', create the _0
'score' column by subsetting the values of 'score_' columns where 'time' is 0 by looping across
the 'score_' columns, then filter
out the rows where 'time' is 0
library(dplyr)
data %>%
group_by(patient) %>%
mutate(across(starts_with('score_'), ~ .x[time == 0],
.names = "{.col}_0")) %>%
ungroup %>%
filter(time != 0)
-output
# A tibble: 8 × 7
patient group time score_a score_b score_a_0 score_b_0
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 A 1 0.202 0.718 0.266 0.687
2 101 B 1 0.898 0.992 0.372 0.384
3 102 A 1 0.945 0.380 0.573 0.770
4 103 B 1 0.661 0.777 0.908 0.498
5 100 A 2 0.629 0.935 0.266 0.687
6 101 B 2 0.0618 0.212 0.372 0.384
7 102 A 2 0.206 0.652 0.573 0.770
8 103 B 2 0.177 0.126 0.908 0.498