I am using R to analyze zipcode and council district data in NYC to determine which zip codes are represented by a council district. I have a table with zips and districts that is currently in long format with two columns.
ZIPCODE | CouncilDistrict |
---|---|
11436 | 28 |
11213 | 36 |
11213 | 41 |
11213 | 35 |
11212 | 41 |
11212 | 37 |
11212 | 42 |
But I want the table to be in a wide format. Essentially, since some zip codes can exist in multiple districts, I want each row to represent a unique zipcode and have columns telling me which council district(s) that zipcode is in.
ZIPCODE | 1 | 2 | 3 |
---|---|---|---|
11436 | 28 | NA | NA |
11213 | 36 | 41 | 35 |
11212 | 41 | 37 | 42 |
I have tried all of the following functions with no luck:
testing7 <- dcast(testing, ZIPCODE ~ CounDist)
testing33 <- reshape(testing, idvar = "ZIPCODE", timevar = "ZIPCODE", direction = "wide")
testing33 <- testing %>%
rownames_to_column() %>%
pivot_longer(!rowname, names_to = "ZIPCODE", values_to = "CounDist")%>%
pivot_wider(names_from = "rowname", values_from = "CounDist")
The closest I have gotten is a 1 x 558 table with each column representing a zip code and the each cell being a list of each district that zip code is in.
How do I get my table into this wide format?
CodePudding user response:
One option to achieve your desired result would be to group by ZIPCODE
and add an id
column before applying pivot_wider
:
library(tidyr)
library(dplyr)
testing %>%
group_by(ZIPCODE) %>%
mutate(id = row_number()) %>%
pivot_wider(names_from = "id", values_from = "CouncilDistrict")
#> # A tibble: 3 × 4
#> # Groups: ZIPCODE [3]
#> ZIPCODE `1` `2` `3`
#> <int> <int> <int> <int>
#> 1 11436 28 NA NA
#> 2 11213 36 41 35
#> 3 11212 41 37 42
DATA
testing <- data.frame(
ZIPCODE = c(11436L, 11213L, 11213L, 11213L, 11212L, 11212L, 11212L),
CouncilDistrict = c(28L, 36L, 41L, 35L, 41L, 37L, 42L)
)