I have a problem with some R Code in terms of sports data over a number of years not being in a very logical order. I have a dataset with 42 variables and almost 80,000 cases, and one is paraphrased below:
dat <- c(2020, 2020, 2020, 2020, 2020, 2020, 2020)
r<- c("QF", "R1", "R15", "R2", "R25", "R3", "SF")
data <- data.frame(dat, r)
Obiously each case will have one of the round details, not all of them, and not only having 26 cases
The problem is that rather than ordering it in the above order of R1-R25, followed by QF, SF and GF, it is ordered in a manner of GF, QF, R1, R10-R19, R2, R21-R25, R3-R9, SF, obviously due to the numerical order of the first digit after the R, and letter order of each thing.
This is how i want it to look, but I cant go through 80,000 cases manuall like this:
dat <- c(2020, 2020, 2020, 2020, 2020, 2020, 2020)
r <- c("R1", "R2", "R3", "R15", "R25", "R3", "QF", "SF")
data <- data.frame(dat, r)
Thanks :)
CodePudding user response:
Since you want "QF"
and "SF"
at the end one option would be to extract the number from the r
column and order
them. "QF"
and "SF"
don't have numeric value in them so they would return NA
and will ordered last.
result <- data[order(as.numeric(stringr::str_extract(data$r, '\\d '))), ]
# dat r
#2 2020 R1
#4 2020 R2
#6 2020 R3
#3 2020 R15
#5 2020 R25
#1 2020 QF
#7 2020 SF
CodePudding user response:
Here's a tidyverse
solution:
library(tidyverse)
data %>%
mutate(r = str_sort(r, numeric = T))
Edit:
To arrange
as "R, Q, S", you can substring your r
variable and apply a custom sort using arrange
and match
:
data %>%
mutate(r = str_sort(r, numeric = T)) %>%
arrange(match(str_sub(r,1,1), c("R", "Q", "S")))
This gives us:
dat r
1 2020 R1
2 2020 R2
3 2020 R3
4 2020 R15
5 2020 R25
6 2020 QF
7 2020 SF