Home > OS >  ordering dataframe using three variable at a same time
ordering dataframe using three variable at a same time

Time:09-07

I would like to order a table using different variable as following:

  • increasing value taking into account the three variables at a same time and not increasing value for the first variable then for the second variable and finally for the last variable as we can do with the function arrange(col1, col2, col3).

Below is an example of what I mean. If not clear enough don't hesitate to ask more detail In the example line 5 (19,5,10) is before line 6 (6, NA, NA) because 5 is lower than 6

set.seed(123) 
data=data.frame(col1=sample(1:20,10), col2=c(sample(1:20,5), NA, NA, NA, NA, NA), col3=c(sample(1:20,5), NA, NA, NA, NA, NA))

  col1 col2 col3
1    15   14    7
2    19    5   10
3    14    9    9
4     3    3    4
5    10    8   14
6     2   NA   NA
7     6   NA   NA
8    11   NA   NA
9     5   NA   NA
10    4   NA   NA

> data_output
   col1 col2 col3
1     2   NA   NA
2     3    3    4
3     4   NA   NA
4     5   NA   NA
5    19    5   10
6     6   NA   NA
7    15   14    7
8    10    8   14
9    14    9    9
10   11   NA   NA

Any idea ?

thank you

CodePudding user response:

Take the minimum value per row, and order it:

data[order(apply(data, 1, min, na.rm = T)), ]

or, with pmin:

data[order(do.call(pmin, c(data, na.rm = TRUE))), ]

or with matrixStats::rowMins:

library(matrixStats)
data[order(rowMins(as.matrix(data), na.rm = T)), ]

   col1 col2 col3
6     2   NA   NA
4     3    3    4
10    4   NA   NA
2    19    5   10
9     5   NA   NA
7     6   NA   NA
1    15   14    7
5    10    8   14
3    14    9    9
8    11   NA   NA

CodePudding user response:

The core idea would be to arrange by the minimum value per row. Here using dplyr:

library(dplyr)

data |>
  rowwise() |>
  mutate(min_sort = min(c_across(everything()), na.rm = TRUE)) |>
  ungroup() |>
  arrange(min_sort) # |>
  #select(-min_sort)

Output:

# A tibble: 10 × 4
    col1  col2  col3 min_sort
   <int> <int> <int>    <int>
 1     2    NA    NA        2
 2     3     3     4        3
 3     4    NA    NA        4
 4    19     5    10        5
 5     5    NA    NA        5
 6     6    NA    NA        6
 7    15    14     7        7
 8    10     8    14        8
 9    14     9     9        9
10    11    NA    NA       11

Update: @Maël beat me with 18 sec, but now you have the same core idea expressed in different ways.

  • Related