Home > Blockchain >  How to rearrange data shape with R
How to rearrange data shape with R

Time:09-30

Current shape of the data:

# A tibble: 92 × 9
   state      category           `1978` `1983` `1988` `1993`  `1999` `2006` `2013`
   <chr>      <chr>              <chr>  <chr>  <chr>  <chr>   <chr>  <chr>  <chr> 
 1 Alabama    Jail population    3,642  3,838  4,363  7,873   9,327  11,430 10,436
 2 Alabama    Percent pre-trial  28%    47%    57%    48%     58%    70%    68% 

Wanted shape of the data:

state     jail     pretrial     year
Alabama   3642      28%         1978    
Alabama   3838      47%         1983    
Alabama   4363      57%         1988    
Alabama   7873      48%         1993    
Alabama   9327      58%         1999    
Alabama   11430      70%         2006    
Alabama   10436      68%         2013  

I've tried various attempts of using dplyr and pivot_wider / pivot_longer and have gotten close but have spent so much time trying to figure this one out. I'm not looking so much for a code recipe to tell me how to do it, but I haven't even been able to find a similar example to go by. If there is a name for this please share that and I can do the lookup and figure out the code, just unsure what to search.

Thanks

CodePudding user response:

Here's a dplyr suggestion:

library(dplyr)
library(tidyr) # pivot_*
dat %>%
  mutate(category = gsub(" .*", "", category)) %>%
  pivot_longer(-c(state, category), names_to = "year") %>%
  pivot_wider(c(state, year), names_from = category, values_from = value)
# # A tibble: 7 x 4
#   state   year  Jail   Percent
#   <chr>   <chr> <chr>  <chr>  
# 1 Alabama 1978  3,642  28%    
# 2 Alabama 1983  3,838  47%    
# 3 Alabama 1988  4,363  57%    
# 4 Alabama 1993  7,873  48%    
# 5 Alabama 1999  9,327  58%    
# 6 Alabama 2006  11,430 70%    
# 7 Alabama 2013  10,436 68%    

You may want to clean up the columns a bit (for numbers, etc), perhaps

dat %>%
  mutate(category = gsub(" .*", "", category)) %>%
  pivot_longer(-c(state, category), names_to = "year") %>%
  pivot_wider(c(state, year), names_from = category, values_from = value) %>%
  mutate(across(c(year, Jail), ~ as.integer(gsub("\\D", "", .))))
# # A tibble: 7 x 4
#   state    year  Jail Percent
#   <chr>   <int> <int> <chr>  
# 1 Alabama  1978  3642 28%    
# 2 Alabama  1983  3838 47%    
# 3 Alabama  1988  4363 57%    
# 4 Alabama  1993  7873 48%    
# 5 Alabama  1999  9327 58%    
# 6 Alabama  2006 11430 70%    
# 7 Alabama  2013 10436 68%    

(There are many ways to deal with cleaning it up.)


Data

dat <- structure(list(state = c("Alabama", "Alabama"), category = c("Jail population", "Percent pre-trial"), `1978` = c("3,642", "28%"), `1983` = c("3,838", "47%"), `1988` = c("4,363", "57%"), `1993` = c("7,873", "48%"), `1999` = c("9,327", "58%"), `2006` = c("11,430", "70%"), `2013` = c("10,436", "68%")), row.names = c("1", "2"), class = "data.frame")
  •  Tags:  
  • r
  • Related