Home > Software design >  Sorting objects by size with unit denominations in a vector or data frame in R
Sorting objects by size with unit denominations in a vector or data frame in R

Time:03-16

I am trying to sort a vector of object sizes in descending order and create a dataframe. I came across an issue with the sorting because the numbers have unit denominations (e.g. Kb, Mb, etc.), and I was wondering how I can sort numbers in ascending or descending order? Because the number have denominations they are essentially treated as character vectors, and can therefore not be sorted after size.

Example 1:

library(dplyr)

l <- list(1:1e6, 1:1e1, 1:1e3, 1:1e7)
l <- sapply(
    l,
    function(x){
        object.size(x) %>% format(units = "auto")
    }
)

# Alt. A: Sorting the vector before coercing to dataframe
sort(l) %>% as.data.frame() 

A data.frame: 4 × 1
.
<chr>
96 bytes
4 Kb
38.1 Mb
3.8 Mb

# Alt. B: Coerce to dataframe then sort using arrange()
as.data.frame(l) %>% arrange(desc(names(.)[1]))

A data.frame: 4 × 1
l
<chr>
3.8 Mb
96 bytes
4 Kb
38.1 Mb

Desired output:

A data.frame: 4 × 1
l
<chr>
38.1 Mb
3.8 Mb
4 Kb
96 bytes

CodePudding user response:

The problem is that your sapply loop only keeps the formatted output, which is much harder to sort. Using purrr you can store two values for each iteration in a data frame and bind the results together. So you can do:

library(dplyr)

l <- list(1:1e6, 1:1e1, 1:1e3, 1:1e7)
l_1 <- purrr::map_df(l, function(x) {
  tibble(
    size_raw = object.size(x),
    size = size_raw %>% format(units = "auto")
  )
})

l_1 %>% 
  arrange(-size_raw)
#> # A tibble: 4 × 2
#>   size_raw       size    
#>   <objct_sz>     <chr>   
#> 1 40000048 bytes 38.1 Mb 
#> 2 4000048 bytes  3.8 Mb  
#> 3 4048 bytes     4 Kb    
#> 4 96 bytes       96 bytes

Created on 2022-03-16 by the reprex package (v2.0.1)

  • Related