Home > Back-end >  Pivot_longer() with multiple new columns
Pivot_longer() with multiple new columns

Time:07-29

I have a question about tidyr::pivot_longer().

Lets say I start with an imaginary dataframe. Imagine people were asked two Questions about two Genres. SQ1_2 would be the second question about the first genre.

set.seed(1234)
genres <- c("Crime", "Horror", "Love", "Sci-Fi", NA)
wide <- data.frame(
  ID = 1:10,
  Genre_1 = sample(genres, 10, replace = TRUE),
  Genre_2 = sample(genres, 10, replace = TRUE),
  SQ1_1 = sample(1:5, 10, replace = TRUE),
  SQ1_2 = sample(1:5, 10, replace = TRUE),
  SQ2_1 = sample(1:5, 10, replace = TRUE),
  SQ2_2 = sample(1:5, 10, replace = TRUE)
)
ID Genre_1 Genre_2 SQ1_1 SQ1_2 SQ2_1 SQ2_2
1 Sci-Fi Sci-Fi 3 5 2 2
2 Horror Sci-Fi 4 1 3 5

I strive to achieve an end result like this:

ID time Genre SQ1 SQ2
1 Genre_1 Sci-Fi 3 5
1 Genre_2 Sci-Fi 2 2
2 Genre_1 Horror 4 1
2 Genre_2 Sci-Fi 3 5

I used to do this task the following way:

long1 <- wide |> tidyr::pivot_longer(col = starts_with("Genre"),
                              names_to = "time",
                              values_to = "genre")
long2 <- wide |> tidyr::pivot_longer(col = c(SQ1_1, SQ2_1),
                              names_to = "time",
                              values_to = "SQ1")
long3 <- wide |> tidyr::pivot_longer(col = c(SQ1_2, SQ2_2),
                              names_to = "time",
                              values_to = "SQ2")
long <- long1 |> 
  dplyr::select(ID, time:genre) |>
  dplyr::mutate(SQ1 = long2$SQ1,
         SQ2 = long3$SQ2)

I was just curious if there is any way to create multiple new columns from different variables with one call of pivot_longer() (or any other function for that matter).

Any help is much appreciated! :)

CodePudding user response:

You need to make the naming rule consistent for those columns you want to pivot into longer format:

  • The j-th genre = Genre_j
  • The i-th question about the j-th genre = SQi_j

Then stack Genre, SQ1, SQ2 respectively:

library(dplyr)
library(tidyr)

wide %>%
  rename_with(~ sub('(\\d)_(\\d)', '\\2_\\1', .x), starts_with('SQ')) %>%
  pivot_longer(-1,
               names_to = c(".value", "Time"),
               names_sep = '_')

# # A tibble: 20 × 5
#       ID Time  Genre    SQ1   SQ2
#    <int> <chr> <chr>  <int> <int>
#  1     1 1     Sci-Fi     3     5
#  2     1 2     Sci-Fi     2     2
#  3     2 1     Horror     4     1
#  4     2 2     Sci-Fi     3     5
  • Related