Home > Blockchain >  Pivot dataframe to get baseline scores as columns
Pivot dataframe to get baseline scores as columns

Time:11-01

I have some trial data where each participant is assessed for multiple scores multiple times, and it is in long format. For example:

set.seed(1)
data <- data.frame(patient = rep(c(100, 101, 102, 103), 3),
                   group = rep(c("A", "B", "A", "B"), 3),
                   time = rep(c(0, 1, 2), each = 4),
                   score_a = runif(12),
                   score_b = runif(12))
print(data, digits = 2)

   patient group time score_a score_b
1      100     A    0   0.266    0.69
2      101     B    0   0.372    0.38
3      102     A    0   0.573    0.77
4      103     B    0   0.908    0.50
5      100     A    1   0.202    0.72
6      101     B    1   0.898    0.99
7      102     A    1   0.945    0.38
8      103     B    1   0.661    0.78
9      100     A    2   0.629    0.93
10     101     B    2   0.062    0.21
11     102     A    2   0.206    0.65
12     103     B    2   0.177    0.13

I would like to pivot this data in some way that I keep it in long format, but extract the baseline (time = 0) measurements as separate columns. So, the new columns would be:

patient group score_a_0 score_b_0 time score_a score_b

With the example dataset, after the weird pivot it would have 8 rows (time 1 and time 2, for 4 patients).

This way, at each timepoint I can adjust for baseline. I'm looking for a general solution, as the actual dataset has 7 scores and 6 followups. I'm assuming the solution will involve a combination of pivot_wider and pivot_longer but I can't make it work.

The ideal output would be:

  patient group time    score_a   score_b score_a_0 score_b_0
1     100     A    1 0.20168193 0.7176185 0.2655087 0.6870228
2     101     B    1 0.89838969 0.9919061 0.3721239 0.3841037
3     102     A    1 0.94467527 0.3800352 0.5728534 0.7698414
4     103     B    1 0.66079779 0.7774452 0.9082078 0.4976992
5     100     A    2 0.62911404 0.9347052 0.2655087 0.6870228
6     101     B    2 0.06178627 0.2121425 0.3721239 0.3841037
7     102     A    2 0.20597457 0.6516738 0.5728534 0.7698414
8     103     B    2 0.17655675 0.1255551 0.9082078 0.4976992

CodePudding user response:

Grouped by 'patient', create the _0 'score' column by subsetting the values of 'score_' columns where 'time' is 0 by looping across the 'score_' columns, then filter out the rows where 'time' is 0

library(dplyr)
data %>% 
   group_by(patient) %>%
   mutate(across(starts_with('score_'), ~ .x[time == 0],
     .names = "{.col}_0")) %>%
  ungroup %>% 
  filter(time != 0) 

-output

 # A tibble: 8 × 7
  patient group  time score_a score_b score_a_0 score_b_0
    <dbl> <chr> <dbl>   <dbl>   <dbl>     <dbl>     <dbl>
1     100 A         1  0.202    0.718     0.266     0.687
2     101 B         1  0.898    0.992     0.372     0.384
3     102 A         1  0.945    0.380     0.573     0.770
4     103 B         1  0.661    0.777     0.908     0.498
5     100 A         2  0.629    0.935     0.266     0.687
6     101 B         2  0.0618   0.212     0.372     0.384
7     102 A         2  0.206    0.652     0.573     0.770
8     103 B         2  0.177    0.126     0.908     0.498
  • Related