Home > Back-end >  Generalize ordering of a vector using a custom order with changing values
Generalize ordering of a vector using a custom order with changing values

Time:09-17

I have a vector that I am trying to order in a specific way. These vectors are based on column names. I have a list of state names. I also have a subset for each state denoted by "_sub" and a calculation on each of these columns denoted by "_pct". Additionally, there's an element called "var" and elements "US" and "US_pct" for the national level.

Here's a reproducible data set:

vec <- c("var", "NY", "AK", "UT", "US", "NY_sub", "UT_sub", "AK_sub", "AK_pct", "AK_sub_pct", "NY_sub_pct", "UT_sub_pct", "UT_pct", "NY_pct", "US_pct")

I'd like for the states to generally be in alphabetical order. But they should also follow another order within the group of states. For example, the "AK" set should come first after "var" and before the "NY" set. Within the set, I'd like to show "AK_sub" first, then "AK_sub_pct", then "AK", then "AK_pct". Then every other state should follow the same pattern. The "US" should be last but same general order. There's also no "US_sub".

I also don't know which states will be included in the vector until running the code so I can't specify the order exactly using match. It has to be done generally.

The alphabetized part is easy: sort(var), but I'm not sure how to go about doing the rest.

Here's my desired outcome. dplyr solutions welcome.

c("var", "AK_sub", "AK_sub_pct", "AK", "AK_pct", "NY_sub", "NY_sub_pct", "NY", "NY_pct", "UT_sub", "UT_sub_pct", "UT", "UT_pct", "US", "US_pct")

CodePudding user response:

Here, is one option in tidyverse. The idea is to do the ordering (arrange) separately i.e. we extract the prefix part before the _ (state abbreviations, country abbreviations, 'var') with word, then create a factor with levels specified in the order of 'var', the state.abb and the country 'US', then we add a second order on the substring extracted ie. 'sub', 'pct', 'sub_pct' with match on the same unique substring in the order we wanted. Lastly, select or pull the original 'vec'

library(dplyr)
library(stringr)
library(tibble)
out <- tibble(vec) %>%
   mutate(new = word(vec, 1, sep="_")) %>%
   arrange(factor(new, levels = c('var', state.abb, "US")), 
       match(str_extract(vec, "sub|pct|sub_pct"), 
          c("sub", "sub_pct", NA, "pct")))  %>%
   select(vec)

-output

> out
# A tibble: 15 x 1
   vec       
   <chr>     
 1 var       
 2 AK_sub    
 3 AK_sub_pct
 4 AK        
 5 AK_pct    
 6 NY_sub    
 7 NY_sub_pct
 8 NY        
 9 NY_pct    
10 UT_sub    
11 UT_sub_pct
12 UT        
13 UT_pct    
14 US        
15 US_pct       
  • Related