I need to reshape a long data set (df below) to wide, where multiple variables are the same across long entries for a given ID, and others change by row. The dummy data is as follows:
ID = c("A", "A", "B", "B", "B", "C", "C")
Name = c("mary", "mary", "berry", "berry", "berry", "paul", "paul")
Set = c("set1", "set2", "set1", "set2", "set3", "set1", "set2")
Street = c("123 St", "234 St", "543 St", "492 st", "231 st", "492 st", "231 st")
State = c("al", "nc", "fl", "ca", "md", "tx", "vt")
df = data.frame(ID, Name, Set, Street, State)
I used pivot_wider to reshape it, but I feel like it's out of order from what I want. Since the actual data has 20 Sets for each entry and 7 Variables for each set, is there an easy way to do this while it's reshaping?
It looks like this:
test <- pivot_wider(df, names_from = c("Set"), values_from = c("Street", "State"))
test
# A tibble: 3 x 8
ID Name Street_set1 Street_set2 Street_set3 State_set1 State_set2 State_set3
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A mary 123 St 234 St NA al nc NA
2 B berry 543 St 492 st 231 st fl ca md
3 C paul 492 st 231 st NA tx vt NA
But what I want is for it to look like this:
ID Name Set1_Street Set1_State Set2_Street Set2_State Set3_Street Set3State
1 A mary 123 St al 234 St nc <NA> <NA>
2 B berry 543 St fl 492 st fl 231 st md
3 C paul 492 st tx 231 st vt <NA> <NA>
I'd also really love your opinions on which option (reshape, spread) is better for large datasets if you have thoughts on that!
Edit: left out the pivot_wider command I used, fixed! Yikes
CodePudding user response:
It may be easier with names_glue
in pivot_wider
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = Set, values_from = c(Street, State),
names_glue = "{tools::toTitleCase(Set)}_{.value}") %>%
dplyr::select(ID, Name, order(readr::parse_number(names(.)[-(1:2)])) 2)
-output
# A tibble: 3 × 8
ID Name Set1_Street Set1_State Set2_Street Set2_State Set3_Street Set3_State
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A mary 123 St al 234 St nc <NA> <NA>
2 B berry 543 St fl 492 st ca 231 st md
3 C paul 492 st tx 231 st vt <NA> <NA>
CodePudding user response:
Following the suggestions shown here in GitHub issue #839, FR: order of columns resulting from pivot_wider, you can approach these by generating the "spec" manually. This sounded harder to me than it actually was.
Here's what it could look like with your data. You'd first define the spec with build_wider_spec()
and then put that in the order you want for the column names (using arrange()
or something similar). In your case you want to order by "Set". You can see I threw in a names_glue
to change the column names but that step wasn't necessary for this.
Once you have done that, use pivot_wider()
on your dataset using the spec
object you created.
library(tidyr)
library(dplyr)
spec <- build_wider_spec(df, names_from = "Set", values_from = c("Street", "State"),
names_glue = "{Set}_{.value}")
spec <- arrange(spec, Set,)
spec
#> # A tibble: 6 x 3
#> .name .value Set
#> <chr> <chr> <chr>
#> 1 set1_Street Street set1
#> 2 set1_State State set1
#> 3 set2_Street Street set2
#> 4 set2_State State set2
#> 5 set3_Street Street set3
#> 6 set3_State State set3
pivot_wider_spec(df, spec)
#> # A tibble: 3 x 8
#> ID Name set1_Street set1_State set2_Street set2_State set3_Street
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 A mary 123 St al 234 St nc <NA>
#> 2 B berry 543 St fl 492 st ca 231 st
#> 3 C paul 492 st tx 231 st vt <NA>
#> # ... with 1 more variable: set3_State <chr>
Created on 2021-12-17 by the reprex package (v2.0.0)