reordering my reshape: long to wide with pivot_wider, different column order-CodePudding

I need to reshape a long data set (df below) to wide, where multiple variables are the same across long entries for a given ID, and others change by row. The dummy data is as follows:

ID = c("A", "A", "B", "B", "B", "C", "C")
Name = c("mary", "mary", "berry", "berry", "berry", "paul", "paul")
Set = c("set1", "set2", "set1", "set2", "set3", "set1", "set2")
Street = c("123 St", "234 St", "543 St", "492 st", "231 st", "492 st", "231 st")
State = c("al", "nc", "fl", "ca", "md", "tx", "vt")

df = data.frame(ID, Name, Set, Street, State)

I used pivot_wider to reshape it, but I feel like it's out of order from what I want. Since the actual data has 20 Sets for each entry and 7 Variables for each set, is there an easy way to do this while it's reshaping?

It looks like this:

test <- pivot_wider(df, names_from = c("Set"), values_from = c("Street", "State"))
test
# A tibble: 3 x 8
  ID    Name  Street_set1 Street_set2 Street_set3 State_set1 State_set2 State_set3
  <chr> <chr> <chr>       <chr>       <chr>       <chr>      <chr>      <chr>     
1 A     mary  123 St      234 St      NA          al         nc         NA        
2 B     berry 543 St      492 st      231 st      fl         ca         md        
3 C     paul  492 st      231 st      NA          tx         vt         NA

But what I want is for it to look like this:

  ID  Name Set1_Street Set1_State Set2_Street Set2_State Set3_Street Set3State
1  A  mary     123 St        al     234 St        nc       <NA>      <NA>
2  B berry     543 St        fl     492 st        fl    231 st         md
3  C  paul     492 st        tx     231 st        vt       <NA>      <NA>

I'd also really love your opinions on which option (reshape, spread) is better for large datasets if you have thoughts on that!

Edit: left out the pivot_wider command I used, fixed! Yikes

CodePudding user response：

It may be easier with names_glue in pivot_wider

library(dplyr)
library(tidyr)
df %>% 
   pivot_wider(names_from = Set, values_from = c(Street, State), 
       names_glue = "{tools::toTitleCase(Set)}_{.value}") %>%   
   dplyr::select(ID, Name, order(readr::parse_number(names(.)[-(1:2)]))   2)

-output

# A tibble: 3 × 8
  ID    Name  Set1_Street Set1_State Set2_Street Set2_State Set3_Street Set3_State
  <chr> <chr> <chr>       <chr>      <chr>       <chr>      <chr>       <chr>     
1 A     mary  123 St      al         234 St      nc         <NA>        <NA>      
2 B     berry 543 St      fl         492 st      ca         231 st      md        
3 C     paul  492 st      tx         231 st      vt         <NA>        <NA>

CodePudding user response：

Following the suggestions shown here in GitHub issue #839, FR: order of columns resulting from pivot_wider, you can approach these by generating the "spec" manually. This sounded harder to me than it actually was.

Here's what it could look like with your data. You'd first define the spec with build_wider_spec() and then put that in the order you want for the column names (using arrange() or something similar). In your case you want to order by "Set". You can see I threw in a names_glue to change the column names but that step wasn't necessary for this.

Once you have done that, use pivot_wider() on your dataset using the spec object you created.

library(tidyr)
library(dplyr)

spec <- build_wider_spec(df, names_from = "Set", values_from = c("Street", "State"),
                         names_glue = "{Set}_{.value}")
spec <- arrange(spec, Set,)
spec
#> # A tibble: 6 x 3
#>   .name       .value Set  
#>   <chr>       <chr>  <chr>
#> 1 set1_Street Street set1 
#> 2 set1_State  State  set1 
#> 3 set2_Street Street set2 
#> 4 set2_State  State  set2 
#> 5 set3_Street Street set3 
#> 6 set3_State  State  set3

pivot_wider_spec(df, spec) 
#> # A tibble: 3 x 8
#>   ID    Name  set1_Street set1_State set2_Street set2_State set3_Street
#>   <chr> <chr> <chr>       <chr>      <chr>       <chr>      <chr>      
#> 1 A     mary  123 St      al         234 St      nc         <NA>       
#> 2 B     berry 543 St      fl         492 st      ca         231 st     
#> 3 C     paul  492 st      tx         231 st      vt         <NA>       
#> # ... with 1 more variable: set3_State <chr>

^{Created on 2021-12-17 by the reprex package (v2.0.0)}