Home > Blockchain >  Sort column in R: strings first (alphabetically), then numbers (numerically)
Sort column in R: strings first (alphabetically), then numbers (numerically)

Time:11-23

I have a dataframe in R with a column consisting of both letters and numbers (eg. "A", "B", "1", "2" etc). I would like to sort the dataframe in such a way that I have the letters first (alphabetically sorted) followed by the numbers (numerically sorted). Ideally in a tidyverse way, but not necessarily.

gtools::mixedsort does almost what I want, but puts numbers before strings and I do not think there is an argument that allows you to push the numbers to the back.

I considered splitting the dataframe, sort each one separately, and then bind by rows again. But I am guessing there should be a better way to do this?

Here is also an example to further clarify my question.

I have:

Col1    Col2   Col3
Apples     A     90
Pears     12     90
Bananas    C     50
Cake       1     50
Apples     A     90
Pears      B     90
Bananas    2     50
Cake     100     50

What I try to achieve is sorting by Col2, alphabetically first, then numerically:

Col1    Col2   Col3
Apples     A     90
Apples     A     90
Apples     A     90
Apples     A     90
Pears      B     90
Bananas    C     50
Cake       1     50
Bananas    2     50
Pears     12     90
Cake     100     50

Many thanks!

CodePudding user response:

Maybe not an optimal solution, but using dplyr and calling df the data you provided:

numbers <- df %>% 
  filter(!(Col2 %in% LETTERS)) %>% 
  mutate(Col2 = as.numeric(Col2)) %>% 
  arrange(Col2) %>% 
  mutate(Col2 = as.character(Col2))

non_numbers <- df %>% 
  filter(Col2 %in% LETTERS) %>% 
  arrange(Col2)

output <- rbind(non_numbers, numbers)

Output:

    Col1 Col2 Col3
1  Apples    A   90
2  Apples    A   90
3   Pears    B   90
4 Bananas    C   50
5    Cake    1   50
6 Bananas    2   50
7   Pears   12   90
8    Cake  100   50

CodePudding user response:

Not in the exact order you want.. but rowbinding can solve that.

DT <- fread("Col1    Col2   Col3
Apples     A     90
Pears     12     90
Bananas    C     50
Cake       1     50
Apples     A     90
Pears      B     90
Bananas    2     50
Cake     100     50")

library(gtools)
DT[mixedorder(DT$Col2),]

CodePudding user response:

For a base R option:

df <- data.frame(Col2=c("100", "B", "A", "Z", "10", "4"), stringsAsFactors=FALSE)
df[order(grepl("^\\d $", df$Col2), sprintf("s", df$Col2)), ]

[1] "A"   "B"   "Z"   "4"   "10"  "100"

The two sorting levels here first place letters before numbers. The second sorting level left pads everything to 10 characters with zeroes. Then it sorts ascending. This is effectively an ascending numeric sort for the numbers. The trick here is to realize that number strings actually do sort correctly as text if they all have the same width.

CodePudding user response:

You can convert using as.numeric to convert the numbers to numeric, replace the places of numbers with NA using is.na<- and order the two vectors.

s <- c("A", 12, "C", 1, "A", "B", 2, 100)

x <- as.numeric(s)
s[order('is.na<-'(s, !is.na(x)), x)]
#[1] "A"   "A"   "B"   "C"   "1"   "2"   "12"  "100"
  • Related