I have a dataframe in R with a column consisting of both letters and numbers (eg. "A", "B", "1", "2" etc). I would like to sort the dataframe in such a way that I have the letters first (alphabetically sorted) followed by the numbers (numerically sorted). Ideally in a tidyverse way, but not necessarily.
gtools::mixedsort does almost what I want, but puts numbers before strings and I do not think there is an argument that allows you to push the numbers to the back.
I considered splitting the dataframe, sort each one separately, and then bind by rows again. But I am guessing there should be a better way to do this?
Here is also an example to further clarify my question.
I have:
Col1 Col2 Col3
Apples A 90
Pears 12 90
Bananas C 50
Cake 1 50
Apples A 90
Pears B 90
Bananas 2 50
Cake 100 50
What I try to achieve is sorting by Col2, alphabetically first, then numerically:
Col1 Col2 Col3
Apples A 90
Apples A 90
Apples A 90
Apples A 90
Pears B 90
Bananas C 50
Cake 1 50
Bananas 2 50
Pears 12 90
Cake 100 50
Many thanks!
CodePudding user response:
Maybe not an optimal solution, but using dplyr
and calling df
the data you provided:
numbers <- df %>%
filter(!(Col2 %in% LETTERS)) %>%
mutate(Col2 = as.numeric(Col2)) %>%
arrange(Col2) %>%
mutate(Col2 = as.character(Col2))
non_numbers <- df %>%
filter(Col2 %in% LETTERS) %>%
arrange(Col2)
output <- rbind(non_numbers, numbers)
Output:
Col1 Col2 Col3
1 Apples A 90
2 Apples A 90
3 Pears B 90
4 Bananas C 50
5 Cake 1 50
6 Bananas 2 50
7 Pears 12 90
8 Cake 100 50
CodePudding user response:
Not in the exact order you want.. but rowbinding can solve that.
DT <- fread("Col1 Col2 Col3
Apples A 90
Pears 12 90
Bananas C 50
Cake 1 50
Apples A 90
Pears B 90
Bananas 2 50
Cake 100 50")
library(gtools)
DT[mixedorder(DT$Col2),]
CodePudding user response:
For a base R option:
df <- data.frame(Col2=c("100", "B", "A", "Z", "10", "4"), stringsAsFactors=FALSE)
df[order(grepl("^\\d $", df$Col2), sprintf("s", df$Col2)), ]
[1] "A" "B" "Z" "4" "10" "100"
The two sorting levels here first place letters before numbers. The second sorting level left pads everything to 10 characters with zeroes. Then it sorts ascending. This is effectively an ascending numeric sort for the numbers. The trick here is to realize that number strings actually do sort correctly as text if they all have the same width.
CodePudding user response:
You can convert using as.numeric
to convert the numbers to numeric, replace the places of numbers with NA
using is.na<-
and order
the two vectors.
s <- c("A", 12, "C", 1, "A", "B", 2, 100)
x <- as.numeric(s)
s[order('is.na<-'(s, !is.na(x)), x)]
#[1] "A" "A" "B" "C" "1" "2" "12" "100"