I wish to pivot multiple columns in wide format to longer using dplyr syntax
My data looks as follows:
x <- data.frame(
provider_id = c(1, 2, 3),
code_1 = c("207ZP0102X", "208600000X", "208100000X"),
primary = c("y", "n", "n"),
code_2 = c("208000000X", "207ZP0102X", "208600000X"),
primary = c("n", "n", "y"),
code_3 = c("208100000X", "208600000X", "207ZP0102X"),
primary = c("n", "y", "n")
)
I am hoping to convert to the following format but I can't figure out the dplyr syntax to achieve this.
Any help would be greatly appreciated
CodePudding user response:
You could rename your columns before applying pivot_longer
:
library(dplyr)
library(tidyr)
x %>%
rename(primary_1 = primary, primary_2 = primary.1, primary_3 = primary.2) %>%
pivot_longer(-provider_id, names_to = c(".value", "Code"), names_sep = "_") %>%
rename(value = code) %>%
mutate(Code = paste0("Code_", Code))
#> # A tibble: 9 × 4
#> provider_id Code value primary
#> <dbl> <chr> <chr> <chr>
#> 1 1 Code_1 207ZP0102X y
#> 2 1 Code_2 208000000X n
#> 3 1 Code_3 208100000X n
#> 4 2 Code_1 208600000X n
#> 5 2 Code_2 207ZP0102X n
#> 6 2 Code_3 208600000X y
#> 7 3 Code_1 208100000X n
#> 8 3 Code_2 208600000X y
#> 9 3 Code_3 207ZP0102X n
CodePudding user response:
The problem here is mostly that the names aren't consistent. You could write a function to rename the subset of columns which start with "primary" to "primary_1", "primary_2", ... numbered in the order they appear.
Then you'd be able to apply the pivot_longer code provided by @stefan to tables with more or less pairs of (code_xx, primary.xx) columns
library(tidyr)
library(dplyr, warn.conflicts = FALSE)
fix_names <- function(nms){
# Columns starting with primary should be named primary_1, primary_2, ...
is_primary <- grepl('^primary', nms)
replace(nms, is_primary, paste0('primary', '_', seq(sum(is_primary))))
}
x %>%
rename_with(fix_names) %>%
pivot_longer(-provider_id, names_to = c(".value", "Code"), names_sep = "_") %>%
rename(value = code) %>%
mutate(Code = paste0("Code_", Code))
#> # A tibble: 9 × 4
#> provider_id Code value primary
#> <dbl> <chr> <chr> <chr>
#> 1 1 Code_1 207ZP0102X y
#> 2 1 Code_2 208000000X n
#> 3 1 Code_3 208100000X n
#> 4 2 Code_1 208600000X n
#> 5 2 Code_2 207ZP0102X n
#> 6 2 Code_3 208600000X y
#> 7 3 Code_1 208100000X n
#> 8 3 Code_2 208600000X y
#> 9 3 Code_3 207ZP0102X n
Created on 2022-03-11 by the reprex package (v2.0.1)