Home > Net >  Scientific notation only for specific numbers of a dataset's column
Scientific notation only for specific numbers of a dataset's column

Time:12-30

I need to format numeric columns of a data frame showing scientific notation only when the number is less than 0.0001. I have written the following code where format function has been used. The problem with this code is that it transforms all numbers.

Any suggestion?

col1 <- c(0.00002, 0.0001, 0.5689785541122558)
col2 <- c(3.5, 45.6546548788, 12585.5663)
tab <- cbind(col1, col2)
tab <- as.data.frame(tab)
format(tab[1], digit = 1, nsmall = 3)

CodePudding user response:

1) dplyr Define a vectorized format and use that in mutate/across:

formatv <- function(x, ...) {
  mapply(format, x, scientific = abs(x) < 0.0001, ...)
}    

library(dplyr)
tab %>% mutate(across(, formatv, digit = 1, nsmall = 3))

2) Base R or with only base R (formatv is from above)

replace(tab, TRUE, lapply(tab, formatv, digit = 1, nsmall = 3))

or

replace(tab, TRUE, formatv(as.matrix(tab), digits = 1, nsmall = 3))

or if you have a small number of columns do each individually

transform(tab,
  col1 = formatv(col1, digits = 1, nsmall = 3),
  col2 = formatv(col2, digits = 1, nsmall = 3))

3) collapse formatv is from above.

library(collapse)
ftransformv(tab, names(tab), formatv, digit = 1, nsmall = 3)

4) purrr map_dfc in purrr can be used. formatv is from above.

library(purrr)
tab %>% map_dfc(formatv, digit = 1, nsmall = 3)

CodePudding user response:

You could apply on both margins 1:2.

as.data.frame(apply(tab, 1:2, \(x) format(x, digits=1, nsmall=3)))
#    col1      col2
# 1 2e-05     3.500
# 2 1e-04    45.655
# 3 0.569 12585.566

Or if you want to format just one specific column:

transform(tab, col1=sapply(col1, format, digits=1, nsmall=3))
#    col1        col2
# 1 2e-05     3.50000
# 2 1e-04    45.65465
# 3 0.569 12585.56630

Important just is, that each element is formatted individually.

Here another way using replace.

tab |> 
  round(5) |>
  (\(.) replace(., . < 1e-4, format(.[. < 1e-4], digit=1, nsmall=3)))()
#      col1        col2
# 1   2e-05     3.50000
# 2   1e-04    45.65465
# 3 0.56898 12585.56630

CodePudding user response:

lapply(tab, \(x) ifelse(abs(x) < 0.0001, format(x, scientific=TRUE), format(x, scientific=FALSE)))
# $col1
# [1] "2.000000e-05" "0.0001000"    "0.5689786"   
# $col2
# [1] "    3.50000" "   45.65465" "12585.56630"

You can reassign back into the frame if you'd like with tab[] <- lapply(tab, ...). Note that all columns are now character not numeric.

This can be done perhaps slightly more efficiently by working on the matrix, now no need for lapply:

tab <- cbind(col1, col2)
ifelse(tab < 0.0001,
       format(tab, digit=1, nsmall=3), 
       format(tab, digit=1, nsmall=3, scientific=FALSE))
#      col1          col2         
# [1,] "2e-05"       "    3.50000"
# [2,] "    0.00010" "   45.65465"
# [3,] "    0.56898" "12585.56630"

which can then be converted into a frame.

  •  Tags:  
  • r
  • Related