Probably this is can be done fairly easily, but my own tries did not give the desired output.
I have a simple data frame with some descriptive statistics.
set.seed(0712)
tibble(group = rep(1:3, each = 2),
scale = rep(LETTERS[1:3], 2),
n = c(100, 100, 150, 150, 135, 135),
mean = runif(6, 1, 5),
sd = runif(6))
A tibble: 6 x 5
group scale n mean sd
<int> <chr> <dbl> <dbl> <dbl>
1 1 A 100 2.42 0.387
2 1 B 100 2.27 0.343
3 2 C 150 4.61 0.709
4 2 A 150 4.51 0.207
5 3 B 135 4.93 0.653
6 3 C 135 3.05 0.253
The first column indicates the group, and each group has three values (n, mean, sd) for two different scales so that the number of groups covers all possible scale combinations (e.g. AB, AC, BC).
I would simply like to rearrange my data frame into the following overview in a wide format where the values of each group are placed within two columns per value instead of two rows.
group scale_1 scale_2 n_1 n_2 mean_1 mean_2 sd_1 sd_2
1 1 A B 100 100 2.42 2.27 0.387 0.343
2 2 C A 150 150 4.61 4.51 0.709 0.207
3 3 B C 135 135 4.93 3.05 0.653 0.253
However, using pivot_wider the situation gets messy as every value x scale combination receives its own column, but not every group has values for each scale. I understand that from a tidy data perspective that is useful, but for a simple overview I would rather have the columns lumped together into value_1 and value_2, etc.
> df %>% tidyr::pivot_wider(id_cols = group, names_from = scale, values_from = n:sd)
# A tibble: 3 x 10
group n_A n_B n_C mean_A mean_B mean_C sd_A sd_B sd_C
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 100 100 NA 2.42 2.27 NA 0.387 0.343 NA
2 2 150 NA 150 4.51 NA 4.61 0.207 NA 0.709
3 3 NA 135 135 NA 4.93 3.05 NA 0.653 0.253
Any ideas on how to collapse this data frame?
CodePudding user response:
To get the 1s and 2s in your names rather than A
, B
and C
you could add a new temporary value by row_number()
before pivoting:
library(tidyverse)
set.seed(0712)
df <- tibble(group = rep(1:3, each = 2),
scale = rep(LETTERS[1:3], 2),
n = c(100, 100, 150, 150, 135, 135),
mean = runif(6, 1, 5),
sd = runif(6))
df |>
group_by(group) |>
mutate(scaleno = row_number()) |>
pivot_wider(names_from = scaleno, values_from = scale:sd)
#> # A tibble: 3 × 9
#> # Groups: group [3]
#> group scale_1 scale_2 n_1 n_2 mean_1 mean_2 sd_1 sd_2
#> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 A B 100 100 2.42 2.27 0.387 0.343
#> 2 2 C A 150 150 4.61 4.51 0.709 0.207
#> 3 3 B C 135 135 4.93 3.05 0.653 0.253
This correctly lines up the value 1 and value 2 of each comparison by assigning a universal key, then uses these keys in naming the new columns.