arrange() not working when dealing with letters and numbers-CodePudding

I've seen many posts related to arrange() issues, but none of them solved my situation, hopefully, this is not a duplicate. I have some columns named "Q1", "Q2", "Q3" and so on. After calculating some basic descriptive stats with rstatix::get_summary_stats(), I need to arrange the new column variable in ascending order (ie, Q1 before Q2 before Q3, etc). I'm sure this is a silly problem, but I can't see what I'm doing wrong.

the raw data looks like this:

ID Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15
1 PART1  4  1  1  5  5  5  1  5  1   1   3   5   5   1   5
2 PART2  5  4  1  5  5  4  1  5  2   1   3   5   4   1   5
3 PART3  2  4  3  5  5  4  1  5  2   1   3   5   4   1   5
so on...

My attempt:

descriptive <-  data %>% 
  rstatix::get_summary_stats(show = c("mean", "sd", "median", "iqr", "min", "max"))  %>% 
  mutate_if(is.numeric, round, 2) %>% 
  dplyr::arrange(variable)

The first 10 lines:

A tibble: 15 x 8
   variable     n  mean    sd median   iqr   min   max
   <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
 1 Q1          63  3.94  1.03      4   2       2     5
 2 Q10         63  1.84  0.88      2   2       1     3
 3 Q11         63  2.62  1.31      3   3       1     5
 4 Q12         63  3.98  1.01      4   2       2     5
 5 Q13         63  4.33  0.8       5   1       2     5
 6 Q14         63  1.91  0.88      2   2       1     4
 7 Q15         63  4.25  0.95      5   1       2     5
 8 Q2          63  2.86  1.58      3   3       1     5
 9 Q3          63  1.97  1.06      2   2       1     4
10 Q4          63  3.98  1.04      4   2       2     5

Note: I've already tried ungroup() and across(starts_with("Q*"))), but nothing works. Any thoughts would be much appreciated, thanks in adv.

data:

> dput(descriptive)[1:10, ]
structure(list(variable = c("Q1", "Q10", "Q11", "Q12", "Q13", 
"Q14", "Q15", "Q2", "Q3", "Q4", "Q5", "Q6", "Q7", "Q8", "Q9"), 
    n = c(63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 
    63, 63), mean = c(3.94, 1.84, 2.62, 3.98, 4.33, 1.91, 4.25, 
    2.86, 1.97, 3.98, 4.21, 4.05, 2.38, 4.03, 2.25), sd = c(1.03, 
    0.88, 1.31, 1.01, 0.8, 0.88, 0.95, 1.58, 1.06, 1.04, 0.94, 
    1.04, 1.36, 1.05, 1.12), median = c(4, 2, 3, 4, 5, 2, 5, 
    3, 2, 4, 4, 4, 2, 4, 2), iqr = c(2, 2, 3, 2, 1, 2, 1, 3, 
    2, 2, 1, 2, 2.5, 2, 2), min = c(2, 1, 1, 2, 2, 1, 2, 1, 1, 
    2, 2, 1, 1, 2, 1), max = c(5, 3, 5, 5, 5, 4, 5, 5, 4, 5, 
    5, 5, 5, 5, 5)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response：

How about just use arrange() on the integer part of variable?

descriptive %>% arrange(as.integer(gsub("Q","",variable)))

Output:

# A tibble: 15 × 8
   variable     n  mean    sd median   iqr   min   max
   <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
 1 Q1          63  3.94  1.03      4   2       2     5
 2 Q2          63  2.86  1.58      3   3       1     5
 3 Q3          63  1.97  1.06      2   2       1     4
 4 Q4          63  3.98  1.04      4   2       2     5
 5 Q5          63  4.21  0.94      4   1       2     5
 6 Q6          63  4.05  1.04      4   2       1     5
 7 Q7          63  2.38  1.36      2   2.5     1     5
 8 Q8          63  4.03  1.05      4   2       2     5
 9 Q9          63  2.25  1.12      2   2       1     5
10 Q10         63  1.84  0.88      2   2       1     3
11 Q11         63  2.62  1.31      3   3       1     5
12 Q12         63  3.98  1.01      4   2       2     5
13 Q13         63  4.33  0.8       5   1       2     5
14 Q14         63  1.91  0.88      2   2       1     4
15 Q15         63  4.25  0.95      5   1       2     5

CodePudding user response：

We could use mixedorder which would work even if the values have different prefix

library(dplyr)
descriptive %>% 
   arrange(order(gtools::mixedorder(variable)))

-output

# A tibble: 15 × 8
   variable     n  mean    sd median   iqr   min   max
   <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
 1 Q1          63  3.94  1.03      4   2       2     5
 2 Q2          63  2.86  1.58      3   3       1     5
 3 Q3          63  1.97  1.06      2   2       1     4
 4 Q4          63  3.98  1.04      4   2       2     5
 5 Q5          63  4.21  0.94      4   1       2     5
 6 Q6          63  4.05  1.04      4   2       1     5
 7 Q7          63  2.38  1.36      2   2.5     1     5
 8 Q8          63  4.03  1.05      4   2       2     5
 9 Q9          63  2.25  1.12      2   2       1     5
10 Q10         63  1.84  0.88      2   2       1     3
11 Q11         63  2.62  1.31      3   3       1     5
12 Q12         63  3.98  1.01      4   2       2     5
13 Q13         63  4.33  0.8       5   1       2     5
14 Q14         63  1.91  0.88      2   2       1     4
15 Q15         63  4.25  0.95      5   1       2     5

Or with parse_number

descriptive %>%
   arrange(readr::parse_number(variable))

CodePudding user response：

There are already better soultions. Just for fun:

We could split variable column with reprex (?<=[A-Za-z])(?=[0-9]) and then arrange:

library(tidyr)
library(dplyr)

df %>% 
  separate(variable, c("quarter", "number"), sep = "(?<=[A-Za-z])(?=[0-9])", remove = FALSE) %>% 
  arrange(quarter, as.numeric(number)) %>% 
  select(-c(quarter, number))

  variable     n  mean    sd median   iqr   min   max
   <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
 1 Q1          63  3.94  1.03      4   2       2     5
 2 Q2          63  2.86  1.58      3   3       1     5
 3 Q3          63  1.97  1.06      2   2       1     4
 4 Q4          63  3.98  1.04      4   2       2     5
 5 Q5          63  4.21  0.94      4   1       2     5
 6 Q6          63  4.05  1.04      4   2       1     5
 7 Q7          63  2.38  1.36      2   2.5     1     5
 8 Q8          63  4.03  1.05      4   2       2     5
 9 Q9          63  2.25  1.12      2   2       1     5
10 Q10         63  1.84  0.88      2   2       1     3
11 Q11         63  2.62  1.31      3   3       1     5
12 Q12         63  3.98  1.01      4   2       2     5
13 Q13         63  4.33  0.8       5   1       2     5
14 Q14         63  1.91  0.88      2   2       1     4
15 Q15         63  4.25  0.95      5   1       2     5