Home > Mobile >  Order a R data frame programmatically using a character vector of column names with different decend
Order a R data frame programmatically using a character vector of column names with different decend

Time:10-26

Given a R dataframe with two columns:

dfc <-  data.frame(col1=c(1,4,3,3,2),col2=c(3,5,11,10,4))

dfc
  col1 col2
     1    3
     4    5
     3   11
     3   10
     2    4

I would like to sort first by col1 then col2

where I have the names stored in a vector cols of strings with the column names

cols=c("col1","col2")

however, I would like to have a way to sometimes use descending order for col1 and maybe ascending for col2

most guides (here and here) don't use character vector of column names although this is answered in a similar question, although this does not answer the question of descending/ascending - and a way to switch between them easily and throw in another column for example.

I got so tired of not finding the answer since I need this quite often, so I wrote a function that uses eval(parse(text="EXPRESSION_HERE"))

order_dataframe_by_cols <- function(dfc,cols=c("col1","col2"),dec_ace=NA) {
  # takes a data frame "dfc", and sorts it by each column in "cols",
  # in a descending or ascending order, defined by dec_ace by each col


  if (length(dec_ace) == 1) {
    if (is.na(dec_ace)) {dec_ace <- rep("",length(cols))}
  }

  str_eval <- "dfc <- dfc[order("
  ix2 <- ""

  for (ix in 1:length(cols)){
    if (ix > 1) {ix2 <- ","}
    str_eval <- paste(str_eval,ix2,dec_ace[ix],"dfc[,'",cols[ix],"']",sep="")
  }
  str_eval <- paste(str_eval,"),]")

  eval(parse(text=str_eval))


  return(dfc)
}

so ascending col1 and descending col2

order_dataframe_by_cols(dfc,cols=c("col1","col2"),dec_ace=c("","-"))
col1 col2
  1    3
  2    4
  3   11
  3   10
  4    5

and then ascending col1 and ascending col2

  order_dataframe_by_cols(dfc,cols=c("col1","col2"),dec_ace=c("",""))
  col1 col2
    1    3
    2    4
    3   10
    3   11
    4    5

notice that the 10 and 11 change place

The problem:

what if I have a as.Date variable, or as.POSIXct variable I also want to sort

dfc2 <- data.frame(col1=c(1,4,3,3,2),col2=c(3,5,11,10,4),col3=c(as.Date(c("2015-04-11","2016-04-11","2017-04-11","2018-04-11","2019-04-11"))))
dfc2 
  col1 col2       col3
     1    3 2015-04-11
     4    5 2016-04-11
     3   11 2017-04-11
     3   10 2018-04-11
     2    4 2019-04-11


order_dataframe_by_cols(dfc2,cols=c("col1","col3","col2"),dec_ace=c("","-",""))
Error in `-.Date`(dfc[, "col3"]) : unary - is not defined for "Date" objects

This cannot be done; I can't sort dates in this fashion while sorting other variables.

I can sort it, but only that column

dfc[order(dfc[,"col3"]),]

So I need a way to sort a data frame using a column vector of column names, with a way to define separate sorting ascending and descending that works on date variables. Thank you for reading.

CodePudding user response:

One possible way to solve your problem:

dfc[do.call(order, c(dfc[cols], decreasing=list(c(FALSE, TRUE)), method="radix")), ]

  col1 col2
1    1    3
5    2    4
3    3   11
4    3   10
2    4    5

Simplified version of your function:

order_dataframe_by_cols = function(df, cols=c("col1","col2"), dec_ace=FALSE) {
  if(missing(dec_ace)) dec_ace = rep_len(dec_ace, length(cols))
  df[do.call(order, c(unname(df[cols]), decreasing=list(dec_ace), method="radix")),]
}

cols=c("col1","col2")

order_dataframe_by_cols(dfc, cols)                         # all ascending
order_dataframe_by_cols(dfc, cols, dec_ace=FALSE)          # same
order_dataframe_by_cols(dfc, cols, dec_ace=TRUE)           # all descending
order_dataframe_by_cols(dfc, cols, dec_ace=c(FALSE, TRUE)) # first ascending, second descending
order_dataframe_by_cols(dfc, cols, dec_ace=c(TRUE, FALSE)) # second descending, first ascending

CodePudding user response:

Generalising my solution from the other question to accept an ascending/descending parameter:

sort_df_by_cols <- function (df, sort_key, sort_order) {
  order_tf <- c(asc = identity, desc = xtfrm)
  df[do.call("order", Map(\(k, a) a(k), df[, sort_key], order_tf[sort_order])), ]
}

To be called as follows:

df <- data.frame(
  var1 = c("b","a","b","a"),
  var2 = c("l","l","k","k"),
  var3 = c("t","w","x","t")
)

sort_df_by_cols(df, c("var1", "var2"), c("desc", "asc"))

… of course existing packages already provide other/better solutions to exactly this problem, e.g. ‘dplyr’, which provides the arrange() function. I generally recommend using this function instead of writing your own.

CodePudding user response:

I like all the above answers, as they answer the OP using adec_ace argument, although I found this a little clunky myself.

And I found a way to do this while having a - in front of the variable to designate descending, similar to the original way of using order:

dfc[order(-dfc$col1, dfc$col2),]

using this function:

order_dataframe_by_cols <- function(dfc, cols=NA,defaultToDecending=FALSE) {

  library(stringr)

  if (length(cols) == 1) {if (is.na(cols)) {cols <- names(dfc)}}

  dec_ace <- rep(defaultToDecending,length(cols))
  for (col in cols){if (substr(col,1,1) == "-") {dec_ace[which(cols == col)] <- TRUE} }
  cols <- str_remove(cols, "[ -]")

  dfc <- dfc[do.call(order, c(unname(dfc[cols]), decreasing=list(dec_ace), method="radix")),]

  return(dfc)
}

dfc <- data.frame(col1=c(1,4,3,3,2),col2=c(3,5,11,10,4),col3=c(as.Date(c("2015-04-11","2016-04-11","2017-04-11","2018-04-11","2019-04-11"))))

order col1 descending, col3 ascending and col2 ascending

order_dataframe_by_cols(dfc,cols=c("-col1","col3"," col2"))

order col1 acending, col3 decending and col2 ascending

order_dataframe_by_cols(dfc,cols=c("col1","-col3"," col2"))

guess it's just a matter of taste.

  • Related