Given a R dataframe with two columns:
dfc <- data.frame(col1=c(1,4,3,3,2),col2=c(3,5,11,10,4))
dfc
col1 col2
1 3
4 5
3 11
3 10
2 4
I would like to sort first by col1
then col2
where I have the names stored in a vector cols
of strings with the column names
cols=c("col1","col2")
however, I would like to have a way to sometimes use descending order for col1
and maybe ascending for col2
most guides (here and here) don't use character vector of column names although this is answered in a similar question, although this does not answer the question of descending/ascending - and a way to switch between them easily and throw in another column for example.
I got so tired of not finding the answer since I need this quite often, so I wrote a function that uses eval(parse(text="EXPRESSION_HERE"))
order_dataframe_by_cols <- function(dfc,cols=c("col1","col2"),dec_ace=NA) {
# takes a data frame "dfc", and sorts it by each column in "cols",
# in a descending or ascending order, defined by dec_ace by each col
if (length(dec_ace) == 1) {
if (is.na(dec_ace)) {dec_ace <- rep("",length(cols))}
}
str_eval <- "dfc <- dfc[order("
ix2 <- ""
for (ix in 1:length(cols)){
if (ix > 1) {ix2 <- ","}
str_eval <- paste(str_eval,ix2,dec_ace[ix],"dfc[,'",cols[ix],"']",sep="")
}
str_eval <- paste(str_eval,"),]")
eval(parse(text=str_eval))
return(dfc)
}
so ascending col1
and descending col2
order_dataframe_by_cols(dfc,cols=c("col1","col2"),dec_ace=c("","-"))
col1 col2
1 3
2 4
3 11
3 10
4 5
and then ascending col1
and ascending col2
order_dataframe_by_cols(dfc,cols=c("col1","col2"),dec_ace=c("",""))
col1 col2
1 3
2 4
3 10
3 11
4 5
notice that the 10 and 11 change place
The problem:
what if I have a as.Date
variable, or as.POSIXct
variable I also want to sort
dfc2 <- data.frame(col1=c(1,4,3,3,2),col2=c(3,5,11,10,4),col3=c(as.Date(c("2015-04-11","2016-04-11","2017-04-11","2018-04-11","2019-04-11"))))
dfc2
col1 col2 col3
1 3 2015-04-11
4 5 2016-04-11
3 11 2017-04-11
3 10 2018-04-11
2 4 2019-04-11
order_dataframe_by_cols(dfc2,cols=c("col1","col3","col2"),dec_ace=c("","-",""))
Error in `-.Date`(dfc[, "col3"]) : unary - is not defined for "Date" objects
This cannot be done; I can't sort dates in this fashion while sorting other variables.
I can sort it, but only that column
dfc[order(dfc[,"col3"]),]
So I need a way to sort a data frame using a column vector of column names, with a way to define separate sorting ascending and descending that works on date variables. Thank you for reading.
CodePudding user response:
One possible way to solve your problem:
dfc[do.call(order, c(dfc[cols], decreasing=list(c(FALSE, TRUE)), method="radix")), ]
col1 col2
1 1 3
5 2 4
3 3 11
4 3 10
2 4 5
Simplified version of your function:
order_dataframe_by_cols = function(df, cols=c("col1","col2"), dec_ace=FALSE) {
if(missing(dec_ace)) dec_ace = rep_len(dec_ace, length(cols))
df[do.call(order, c(unname(df[cols]), decreasing=list(dec_ace), method="radix")),]
}
cols=c("col1","col2")
order_dataframe_by_cols(dfc, cols) # all ascending
order_dataframe_by_cols(dfc, cols, dec_ace=FALSE) # same
order_dataframe_by_cols(dfc, cols, dec_ace=TRUE) # all descending
order_dataframe_by_cols(dfc, cols, dec_ace=c(FALSE, TRUE)) # first ascending, second descending
order_dataframe_by_cols(dfc, cols, dec_ace=c(TRUE, FALSE)) # second descending, first ascending
CodePudding user response:
Generalising my solution from the other question to accept an ascending/descending parameter:
sort_df_by_cols <- function (df, sort_key, sort_order) {
order_tf <- c(asc = identity, desc = xtfrm)
df[do.call("order", Map(\(k, a) a(k), df[, sort_key], order_tf[sort_order])), ]
}
To be called as follows:
df <- data.frame(
var1 = c("b","a","b","a"),
var2 = c("l","l","k","k"),
var3 = c("t","w","x","t")
)
sort_df_by_cols(df, c("var1", "var2"), c("desc", "asc"))
… of course existing packages already provide other/better solutions to exactly this problem, e.g. ‘dplyr’, which provides the arrange()
function. I generally recommend using this function instead of writing your own.
CodePudding user response:
I like all the above answers, as they answer the OP using adec_ace
argument, although I found this a little clunky myself.
And I found a way to do this while having a -
in front of the variable to designate descending, similar to the original way of using order:
dfc[order(-dfc$col1, dfc$col2),]
using this function:
order_dataframe_by_cols <- function(dfc, cols=NA,defaultToDecending=FALSE) {
library(stringr)
if (length(cols) == 1) {if (is.na(cols)) {cols <- names(dfc)}}
dec_ace <- rep(defaultToDecending,length(cols))
for (col in cols){if (substr(col,1,1) == "-") {dec_ace[which(cols == col)] <- TRUE} }
cols <- str_remove(cols, "[ -]")
dfc <- dfc[do.call(order, c(unname(dfc[cols]), decreasing=list(dec_ace), method="radix")),]
return(dfc)
}
dfc <- data.frame(col1=c(1,4,3,3,2),col2=c(3,5,11,10,4),col3=c(as.Date(c("2015-04-11","2016-04-11","2017-04-11","2018-04-11","2019-04-11"))))
order col1
descending, col3
ascending and col2
ascending
order_dataframe_by_cols(dfc,cols=c("-col1","col3"," col2"))
order col1
acending, col3
decending and col2
ascending
order_dataframe_by_cols(dfc,cols=c("col1","-col3"," col2"))
guess it's just a matter of taste.