Home > Blockchain >  How to apply sprintf function to r data frame to convert values to percentages while maintaining ori
How to apply sprintf function to r data frame to convert values to percentages while maintaining ori

Time:04-05

I'm working through the options for formatting all numeric values in a data frame, and not just selected columns. I start with the following base data frame called "c" when running the code beneath it:

> c
        A        B
1 3.412324 2.234200
2 3.245236 4.234234
    
Related code:
a <- c(3.412324463,3.2452364)
b <- c(2.2342,4.234234)
c <- data.frame(A=a, B=b, stringsAsFactors = FALSE)

Next, I round all the numbers in the above "c" data frame to 2 decimal places, resulting in data frame "d" shown below with the related code immediately underneath:

 > d
         A    B
 [1,] 3.41 2.23
 [2,] 3.25 4.23
        
 Related code:
 d <- as.data.frame(lapply(c, formatC, decimal.mark =".", format = "f", digits = 2))
 d <- sapply(d[,],as.numeric)

Last step, I'd like to express the above data frame "d" in percentages, in a new data frame called "e". I get the below results as a list, using the code shown beneath it.

> e
   X.341.0.. X.325.0.. X.223.0.. X.423.0..
1    341.0%    325.0%    223.0%    423.0%
    
Related code:
e <- as.data.frame(lapply(d*100, sprintf, fmt = "%.1f%%"))

How to I modify the code, in an efficient manner, to leave the data frame structure intact in deriving data frame "e", the way it does when generating data frame "d"? It would be most helpful to see a solution in both base R and dplyr.

I'm pretty sure the issue lies in my use of lapply() in creating data frame "e" (yes, by now I know lapply() spits out lists), but it worked fine in maintaining the data frame structure in creating data frame "d"!

All values in the data frame are formatted the same, so there's no need to subset the columns etc.

CodePudding user response:

The thing with lapply is that it returns a list. A data.frame is a special case of a list, but to make lapply modify a data frame of a particular structure while maintaining that structure, the easiest way to do it is df[] = lapply(df, ...). The extra [] preserves the existing structure.

## d better base version
## `round` is friendlier than `sprintf` - it returns numerics
d = c
d[] = lapply(d, round, 2)
d 
#      A    B
# 1 3.41 2.23
# 2 3.25 4.23

## dplyr version
d_dplyr = c %>%
  mutate(across(everything(), round, 2))
d_dplyr
#      A    B
# 1 3.41 2.23
# 2 3.25 4.23

## e base
e = d
e[] = lapply(e * 100, sprintf, fmt = "%.1f%%")
e
#        A      B
# 1 341.0% 223.0%
# 2 325.0% 423.0%

## e dplyr
## in `tidyverse`, we'll use `scales::percent_format` 
## which generates a percent conversion function according to specification
## the default already multiplies by 100 and adds the `%` sign
## we just need to specify the accuracy.
## (note that we can easily start from `c` again)
e_dplyr = c %>%
  mutate(across(everything(), scales::percent_format(accuracy = 0.1)))
e_dplyr
#        A      B
# 1 341.2% 223.4%
# 2 324.5% 423.4%
  • Related