Home > Software engineering >  How to reshape data to long format by combining multiple columns and repeating rows?
How to reshape data to long format by combining multiple columns and repeating rows?

Time:05-03

I have the below example data and I'm aiming to convert it to long format.

dat <- data.frame(
  
  subject = rep(paste("subject", 1:10), each = 4),
  prac1 = round(rnorm(40, 0.25, 0.03), 3),
  prac2 = round(rnorm(40, 0.25, 0.03), 3),
  prac3 = round(rnorm(40, 0.25, 0.03), 3),
  prac1_vs_crit1 = round(rnorm(40, 0.01, 0.005), 3),
  prac2_vs_crit1 = round(rnorm(40, 0.01, 0.005), 3),
  prac3_vs_crit1 = round(rnorm(40, 0.01, 0.005), 3),
  prac1_vs_crit2 = round(rnorm(40, 0.01, 0.005), 3),
  prac2_vs_crit2 = round(rnorm(40, 0.01, 0.005), 3),
  prac3_vs_crit2 = round(rnorm(40, 0.01, 0.005), 3)
  
)

Columns 2-4 are practical estimates and 5-10 are differences between the practical (prac1, prac2, prac3) and two different criterions (crit1/crit2). Each subject has four rows to represent four different trials.

    subject prac1 prac2 prac3 prac1_vs_crit1 prac2_vs_crit1 prac3_vs_crit1
1 subject 1 0.261 0.245 0.257          0.011          0.008          0.008
2 subject 1 0.196 0.277 0.269          0.012          0.002          0.019
3 subject 1 0.235 0.249 0.189          0.010          0.006          0.010
4 subject 1 0.217 0.186 0.261          0.007          0.003          0.015
5 subject 2 0.269 0.225 0.218          0.013          0.009          0.017
6 subject 2 0.254 0.245 0.304          0.016          0.008          0.006
  prac1_vs_crit2 prac2_vs_crit2 prac3_vs_crit2
1          0.013          0.008          0.016
2          0.008          0.010          0.007
3          0.002          0.008          0.006
4          0.005          0.012          0.013
5          0.009          0.011          0.003
6          0.011          0.012          0.026

I'd like to transform my data to long format so it looks like this:

    subject     comparison difference prac_value
1 subject 1 prac1_vs_crit1      0.011      0.261
2 subject 1 prac2_vs_crit1      0.008      0.245
3 subject 1 prac3_vs_crit1      0.008      0.257
4 subject 1 prac1_vs_crit2      0.013      0.261
5 subject 1 prac2_vs_crit2      0.008      0.245
6 subject 1 prac3_vs_crit2      0.016      0.257

Where each subject has 24 rows (6 rows per trial) with prac_value representing the value that coincides with the column named after the first separator ("_") in the original format. For example, for subject 1 and prac1_vs_crit1, the prac_value should be 0.261 as prac1 in the original format had this value.

Where I'm stuck is how to approach this problem, specifically as it relates to achieving the desired output for the prac_value column. Will appreciate any help!

CodePudding user response:

You could achieve your desired result by first renaming your columns in a consistent manner and two pivot_longers like so:

set.seed(123)

library(tidyr)
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)

dat_long <- dat |> 
  rename_with(.fn = ~ paste0(.x, "_value"), matches("^prac\\d $")) |> 
  rename_with(.fn = ~ gsub("_vs", "", .x)) |> 
  pivot_longer(-subject, names_to = c("prac", ".value"), names_sep = "_") |> 
  pivot_longer(c(crit1, crit2), names_to = "crit", values_to = "difference") |> 
  unite(comparison, prac, crit, sep = "_vs_") |> 
  rename(prac_value = value)

head(dat_long)
#> # A tibble: 6 × 4
#>   subject   comparison     prac_value difference
#>   <chr>     <chr>               <dbl>      <dbl>
#> 1 subject 1 prac1_vs_crit1      0.233      0.011
#> 2 subject 1 prac1_vs_crit2      0.233      0.006
#> 3 subject 1 prac2_vs_crit1      0.229      0.015
#> 4 subject 1 prac2_vs_crit2      0.229      0.002
#> 5 subject 1 prac3_vs_crit1      0.25       0.021
#> 6 subject 1 prac3_vs_crit2      0.25       0.011

count(dat_long, subject)
#> # A tibble: 10 × 2
#>    subject        n
#>    <chr>      <int>
#>  1 subject 1     24
#>  2 subject 10    24
#>  3 subject 2     24
#>  4 subject 3     24
#>  5 subject 4     24
#>  6 subject 5     24
#>  7 subject 6     24
#>  8 subject 7     24
#>  9 subject 8     24
#> 10 subject 9     24
  •  Tags:  
  • r
  • Related