Current shape of the data:
# A tibble: 92 × 9
state category `1978` `1983` `1988` `1993` `1999` `2006` `2013`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Alabama Jail population 3,642 3,838 4,363 7,873 9,327 11,430 10,436
2 Alabama Percent pre-trial 28% 47% 57% 48% 58% 70% 68%
Wanted shape of the data:
state jail pretrial year
Alabama 3642 28% 1978
Alabama 3838 47% 1983
Alabama 4363 57% 1988
Alabama 7873 48% 1993
Alabama 9327 58% 1999
Alabama 11430 70% 2006
Alabama 10436 68% 2013
I've tried various attempts of using dplyr and pivot_wider / pivot_longer and have gotten close but have spent so much time trying to figure this one out. I'm not looking so much for a code recipe to tell me how to do it, but I haven't even been able to find a similar example to go by. If there is a name for this please share that and I can do the lookup and figure out the code, just unsure what to search.
Thanks
CodePudding user response:
Here's a dplyr suggestion:
library(dplyr)
library(tidyr) # pivot_*
dat %>%
mutate(category = gsub(" .*", "", category)) %>%
pivot_longer(-c(state, category), names_to = "year") %>%
pivot_wider(c(state, year), names_from = category, values_from = value)
# # A tibble: 7 x 4
# state year Jail Percent
# <chr> <chr> <chr> <chr>
# 1 Alabama 1978 3,642 28%
# 2 Alabama 1983 3,838 47%
# 3 Alabama 1988 4,363 57%
# 4 Alabama 1993 7,873 48%
# 5 Alabama 1999 9,327 58%
# 6 Alabama 2006 11,430 70%
# 7 Alabama 2013 10,436 68%
You may want to clean up the columns a bit (for numbers, etc), perhaps
dat %>%
mutate(category = gsub(" .*", "", category)) %>%
pivot_longer(-c(state, category), names_to = "year") %>%
pivot_wider(c(state, year), names_from = category, values_from = value) %>%
mutate(across(c(year, Jail), ~ as.integer(gsub("\\D", "", .))))
# # A tibble: 7 x 4
# state year Jail Percent
# <chr> <int> <int> <chr>
# 1 Alabama 1978 3642 28%
# 2 Alabama 1983 3838 47%
# 3 Alabama 1988 4363 57%
# 4 Alabama 1993 7873 48%
# 5 Alabama 1999 9327 58%
# 6 Alabama 2006 11430 70%
# 7 Alabama 2013 10436 68%
(There are many ways to deal with cleaning it up.)
Data
dat <- structure(list(state = c("Alabama", "Alabama"), category = c("Jail population", "Percent pre-trial"), `1978` = c("3,642", "28%"), `1983` = c("3,838", "47%"), `1988` = c("4,363", "57%"), `1993` = c("7,873", "48%"), `1999` = c("9,327", "58%"), `2006` = c("11,430", "70%"), `2013` = c("10,436", "68%")), row.names = c("1", "2"), class = "data.frame")