How can I convert numbers into abbreviated terms in text?


My data frame is like this

location      population        
Canada           38067913       
China           1444216102      
Mexico          130262220      

And i would need to mutate() the population numbers into a new variable in abbreviated terms as such :

location      population        pop_text
Canada           38067913        38.06 milions
China           1444216102       1.44 billions
Mexico          130262220       130.26 millions

CodePudding user response:

My approach would be to throw together some nested ifelse() statements if there are only a handful of alternatives to deal with (as is the case with population size).

ifelse(x>=1e12, sprintf("%.2f trillion", x/1e12),
  ifelse(x>=1e9, sprintf("%.2f billion", x/1e9), 
  ifelse(x>=1e6, sprintf("%.2f million", x/1e6), 
  format(x, big.mark=","))))

If df is your data.frame then yours would be

df$pop_text <- 
  ifelse(df$population>=1e12, sprintf("%.2f trillion", df$population/1e12),
    ifelse(df$population>=1e9, sprintf("%.2f billion", df$population/1e9), 
    ifelse(df$population>=1e6, sprintf("%.2f million", df$population/1e6), 
    format(df$population, big.mark=","))))

An ifelse() evaluates the condition in the first argument (e.g. is x over 1 trillion?) and, if true, performs the operation in the second argument and moves to the next value of x, and if false, performs the operation in the third argument and moves to the next value of x. Placing ifelse() calls in the third argument of another means that second ifelse() gets evaluated if the first returned a false.

CodePudding user response:

With a little finagling you could hijack how format.object_size prints:

convert_numbers_into_abbreviated_terms_in_text <- function(x, digits = 1L) {
  key <- setNames(
    c('B', 'kB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'),
    c('', 'thousands', 'millions', 'billions', 'trillions',
      'quadrillions', 'quintillions', 'sextillions', 'septillions')
  sapply(x, function(xx) {
    xx <- format(
      structure(xx, class = 'object_size'), units = 'auto', standard = 'SI',
      digits = digits
    xx <- strsplit(xx, ' ')[[1L]]
    trimws(paste(xx[1L], names(key)[match(xx[2L], key)], collapse = ' '))

x <- read.table(header = TRUE, text = "location      population        
Canada           38067913       
China           1444216102      
Mexico          130262220")

# [1] "38.1 millions"  "1.4 billions"   "130.3 millions"
convert_numbers_into_abbreviated_terms_in_text(x$population, digits = 2)
# [1] "38.07 millions"  "1.44 billions"   "130.26 millions"

si <- 1000^(0:8)
# [1] "1"              "1 thousands"    "1 millions"     "1 billions"     "1 trillions"    "1 quadrillions" "1 quintillions"
# [8] "1 sextillions"  "1 septillions" 

CodePudding user response:

We could use a switch based on a comparison with respective powers of 10.

f <- Vectorize(function(x, digits=2) {
  u <- mapply(`^`, 10, 0:3*3)^-1 * x
  o <- sum(u > 1) |>
    (\(x) sapply(x, \(i) {
      switch(i, '', 'thousands', 'millions', 'billions')
  paste(round(u[u < 1000 & u > 1], digits), o)

transform(dat, popText=f(population))
#   location population         popText
# 1  Andorra      77443 77.44 thousands
# 2   Canada   38067913  38.07 millions
# 3    China 1444216102   1.44 billions
# 4   Mexico  130262220 130.26 millions
# 5    Earth 7577130400   7.58 billions


dat <- structure(list(location = c("Andorra", "Canada", "China", "Mexico", 
"Earth"), population = c(77443, 38067913, 1444216102, 130262220, 
7577130400)), class = "data.frame", row.names = c(NA, -5L))
