Home > Net >  pivoting multiple columns long using dplyr
pivoting multiple columns long using dplyr

Time:03-12

I wish to pivot multiple columns in wide format to longer using dplyr syntax

My data looks as follows:

x <- data.frame(
  provider_id = c(1, 2, 3),
  code_1 = c("207ZP0102X", "208600000X", "208100000X"),
  primary = c("y", "n", "n"),
  code_2 = c("208000000X", "207ZP0102X", "208600000X"),
  primary = c("n", "n", "y"),
  code_3 = c("208100000X", "208600000X", "207ZP0102X"),
  primary = c("n", "y", "n")
)

I am hoping to convert to the following format but I can't figure out the dplyr syntax to achieve this.

enter image description here

Any help would be greatly appreciated

CodePudding user response:

You could rename your columns before applying pivot_longer:

library(dplyr)
library(tidyr)

x %>% 
  rename(primary_1 = primary, primary_2 = primary.1, primary_3 = primary.2) %>% 
  pivot_longer(-provider_id, names_to = c(".value", "Code"), names_sep = "_") %>% 
  rename(value = code) %>% 
  mutate(Code = paste0("Code_", Code))
#> # A tibble: 9 × 4
#>   provider_id Code   value      primary
#>         <dbl> <chr>  <chr>      <chr>  
#> 1           1 Code_1 207ZP0102X y      
#> 2           1 Code_2 208000000X n      
#> 3           1 Code_3 208100000X n      
#> 4           2 Code_1 208600000X n      
#> 5           2 Code_2 207ZP0102X n      
#> 6           2 Code_3 208600000X y      
#> 7           3 Code_1 208100000X n      
#> 8           3 Code_2 208600000X y      
#> 9           3 Code_3 207ZP0102X n

CodePudding user response:

The problem here is mostly that the names aren't consistent. You could write a function to rename the subset of columns which start with "primary" to "primary_1", "primary_2", ... numbered in the order they appear.

Then you'd be able to apply the pivot_longer code provided by @stefan to tables with more or less pairs of (code_xx, primary.xx) columns


library(tidyr)
library(dplyr, warn.conflicts = FALSE)

fix_names <- function(nms){
  # Columns starting with primary should be named primary_1, primary_2, ...
    is_primary <- grepl('^primary', nms)
    replace(nms, is_primary, paste0('primary', '_', seq(sum(is_primary))))
}

x %>% 
  rename_with(fix_names) %>% 
  pivot_longer(-provider_id, names_to = c(".value", "Code"), names_sep = "_") %>% 
  rename(value = code) %>% 
  mutate(Code = paste0("Code_", Code))
#> # A tibble: 9 × 4
#>   provider_id Code   value      primary
#>         <dbl> <chr>  <chr>      <chr>  
#> 1           1 Code_1 207ZP0102X y      
#> 2           1 Code_2 208000000X n      
#> 3           1 Code_3 208100000X n      
#> 4           2 Code_1 208600000X n      
#> 5           2 Code_2 207ZP0102X n      
#> 6           2 Code_3 208600000X y      
#> 7           3 Code_1 208100000X n      
#> 8           3 Code_2 208600000X y      
#> 9           3 Code_3 207ZP0102X n

Created on 2022-03-11 by the reprex package (v2.0.1)

  • Related