My data frame is like this
location population
Canada 38067913
China 1444216102
Mexico 130262220
And i would need to mutate() the population numbers into a new variable in abbreviated terms as such :
location population pop_text
Canada 38067913 38.06 milions
China 1444216102 1.44 billions
Mexico 130262220 130.26 millions
CodePudding user response:
My approach would be to throw together some nested ifelse()
statements if there are only a handful of alternatives to deal with (as is the case with population size).
ifelse(x>=1e12, sprintf("%.2f trillion", x/1e12),
ifelse(x>=1e9, sprintf("%.2f billion", x/1e9),
ifelse(x>=1e6, sprintf("%.2f million", x/1e6),
format(x, big.mark=","))))
If df
is your data.frame then yours would be
df$pop_text <-
ifelse(df$population>=1e12, sprintf("%.2f trillion", df$population/1e12),
ifelse(df$population>=1e9, sprintf("%.2f billion", df$population/1e9),
ifelse(df$population>=1e6, sprintf("%.2f million", df$population/1e6),
format(df$population, big.mark=","))))
An ifelse()
evaluates the condition in the first argument (e.g. is x over 1 trillion?) and, if true, performs the operation in the second argument and moves to the next value of x
, and if false, performs the operation in the third argument and moves to the next value of x
. Placing ifelse()
calls in the third argument of another means that second ifelse()
gets evaluated if the first returned a false.
CodePudding user response:
With a little finagling you could hijack how format.object_size
prints:
convert_numbers_into_abbreviated_terms_in_text <- function(x, digits = 1L) {
key <- setNames(
c('B', 'kB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'),
c('', 'thousands', 'millions', 'billions', 'trillions',
'quadrillions', 'quintillions', 'sextillions', 'septillions')
)
sapply(x, function(xx) {
xx <- format(
structure(xx, class = 'object_size'), units = 'auto', standard = 'SI',
digits = digits
)
xx <- strsplit(xx, ' ')[[1L]]
trimws(paste(xx[1L], names(key)[match(xx[2L], key)], collapse = ' '))
})
}
x <- read.table(header = TRUE, text = "location population
Canada 38067913
China 1444216102
Mexico 130262220")
convert_numbers_into_abbreviated_terms_in_text(x$population)
# [1] "38.1 millions" "1.4 billions" "130.3 millions"
convert_numbers_into_abbreviated_terms_in_text(x$population, digits = 2)
# [1] "38.07 millions" "1.44 billions" "130.26 millions"
si <- 1000^(0:8)
convert_numbers_into_abbreviated_terms_in_text(si)
# [1] "1" "1 thousands" "1 millions" "1 billions" "1 trillions" "1 quadrillions" "1 quintillions"
# [8] "1 sextillions" "1 septillions"
CodePudding user response:
We could use a switch
based on a comparison with respective powers of 10.
f <- Vectorize(function(x, digits=2) {
u <- mapply(`^`, 10, 0:3*3)^-1 * x
o <- sum(u > 1) |>
(\(x) sapply(x, \(i) {
switch(i, '', 'thousands', 'millions', 'billions')
}))()
paste(round(u[u < 1000 & u > 1], digits), o)
})
transform(dat, popText=f(population))
# location population popText
# 1 Andorra 77443 77.44 thousands
# 2 Canada 38067913 38.07 millions
# 3 China 1444216102 1.44 billions
# 4 Mexico 130262220 130.26 millions
# 5 Earth 7577130400 7.58 billions
Data:
dat <- structure(list(location = c("Andorra", "Canada", "China", "Mexico",
"Earth"), population = c(77443, 38067913, 1444216102, 130262220,
7577130400)), class = "data.frame", row.names = c(NA, -5L))