Home > Mobile >  Sort column names alphabetically before underscore, then sort differently after underscore
Sort column names alphabetically before underscore, then sort differently after underscore

Time:03-16

I have a dataframe and want to sort the columns alphabetically, but only up to a certain point within the name, then the order for each variable has to be changed depending on the suffix, how would this be done?

For example:

a_start <- sample(5)
a_end <- sample(5)
b_start <- sample(5)
b_end <- sample(5)

df <- data.frame(a_end, a_start, b_end, b_start)
df

  a_end a_start b_end b_start
1     1       4     1       5
2     2       2     5       1
3     5       1     3       2
4     4       3     2       4
5     3       5     4       3

If the columns have been sorted alphabetically, we would get the above. So how would I sort the names alphabetically before the underscore, then choose to have start before end like so:

  a_start a_end b_start b_end
1       4     1       5     1
2       2     2       1     5
3       1     5       2     3
4       3     4       4     2
5       5     3       3     4

For such a trivial example, it's easy enough to just sort the columns manually, but if there are hundreds of columns how could it be done?

Edit: Thanks for the replies so far but I've realised this example is not as comprehensive as it could have been. It should contain more than one underscore and sort alphabetically up to the last one, at which point the suffix gets sorted differently.

eg.

col_a_start <- sample(5)
col_a_end <- sample(5)
col_b_start <- sample(5)
col_b_end <- sample(5)

df <- data.frame(col_a_end, col_a_start, col_b_end, col_b_start)

sorted to give:

  col_a_start col_a_end col_b_start col_b_end
1           2         4           4         3
2           3         1           2         1
3           5         5           1         5
4           4         2           5         4
5           1         3           3         2

CodePudding user response:

With the column name patterns showed, here is a base R option

df[order(trimws(names(df), whitespace = "_.*"), 
    factor(trimws(names(df), whitespace = ".*_"), 
      levels = c("start", "end")))]

-output

   a_start a_end b_start b_end
1       1     5       4     5
2       5     3       2     3
3       3     2       3     4
4       4     4       1     2
5       2     1       5     1

CodePudding user response:

I've taken a stab at the answer. Basically create a new table with the column names, extract in separate columns the column name components you want to control then arrange the column names in order and lastly add this as argument into the relocate() formula

#assign col names into new tibble
col_names <- tibble(x=df %>% names())

#extract out into new columns name components 
col_names_order <- col_names %>% 
  mutate(before=str_extract(x,".*(?=_)"),
         after=str_extract(x,"(?<=_).*")
) %>% 
  arrange(before,desc(after)) %>%
 pull(x)

#rearrange colnames by this logic
df %>% relocate(col_names_order)

  •  Tags:  
  • r
  • Related