Home > OS >  Multiply Columns Based on Condition and Put Comma Separator Using R
Multiply Columns Based on Condition and Put Comma Separator Using R

Time:12-27

I have the following dataframe. Based on a condition -- millions or thousands present in a text -- I need to multiply the numeric column with million or thousand, and then put a thousand separator:

df <- data.frame(col1=c('', 'assets', 'loss', 'liability'),
                 col2=c(NA, 5000, -1400, 300),
                 col3=c(NA, 4500, -1100, 500))

df

col1        col2    col3
<chr>       <dbl>   <dbl>
             NA      NA
assets      5000    4500
loss       -1400   -1100
liability   300      500
                                                                  ​               ​        ​

I am trying the following script to see which condition -- million or thousand -- is present, multiply with that, and then put a thousand separator:

multiply_columns <- function(df){

  text <- "in millions, except share and per share data"

  # Iterate over the columns
  for (idx in 2:length(names(df))) {
      
      # Check if text contains million
      if (grepl(text, "in millions", fixed = TRUE)) {
        
        # If yes, multiply column values with million else thousand
        df[, idx] <- format(df[, idx]*1000000, big.mark=",", scientific=FALSE)
      
      } else 
        df[, idx] <- format(df[, idx]*1000, big.mark=",", scientific=FALSE)
}  
  return(df)
}

It seems the function isn't multiplying correctly. It's multiplying with 1000 whereas the text contains millions.

multiply_columns(df)

col1             col2          col3
<chr>           <chr>          <chr>
                 NA            NA
assets        5,000,000     4,500,000
loss         -1,400,000    -1,100,000
liability     300,000       500,000

Desired Output

col1             col2           col3
<chr>            <dbl>          <dbl>
                  NA             NA
assets        5,000,000,000   4,500,000,000
loss         -1,400,000,000   -1,100,000,000
liability     300,000,000     500,000,000

Any suggestions would be appreciated. Thanks!

CodePudding user response:

I'm more familiar with tidyverse functions, but here is how I would structure your function:

library(tidyverse)

conv_th_or_mm <- function(df, text){

  if(str_detect(text, regex("million", ignore_case=T))){

    df <- df %>%
      mutate(
        across(
          .cols = where(is.numeric),
          .fns = ~.x*1000000
        )
      )

  } else {

    df <- df %>%
      mutate(
        across(
          .cols = where(is.numeric),
          .fns = ~.x*1000
        )
      )

  }

  df <- df %>%
    mutate(
      across(
        .cols = where(is.numeric),
        .fns = ~formatC(.x, big.mark=",", digits=0, format="f")
      )
    )

  return(df)
}

A few points:

  • I added text in as an argument of the function, which I think you will want to pass in depending on the dataframe.
  • I've only used formatC() before, so I've used equivalent parameters to match your format() function.
  • I gave your function a slightly different name, which maybe isn't important now but could be if you next look at it a year from now and there are two dozen other functions you've written.
  • If your use case calls for it, you could be more specific with .cols but I assumed all the numeric values needed to be converted.

Editing to add that R is very flexible for where it goes to look for variables you call. It is possible that R could go look for text in the global environment if you call it within your function despite not passing it in as an argument (which is strange behavior for some other languages). However, if the data outside of your function is my_df and your text you're evaluating is my_text, you would use my function like this:

my_df <- conv_th_or_mm(df = my_df, text = my_text)

CodePudding user response:

We may extract the thousand, million and replace with 1e3 and 1e6 to multiply. Assuming that the text can be input to the function

library(dplyr)
multiply_columns <- function(df, text){

 val_to_mult <- as.numeric(stringr::str_replace_all(stringr::str_extract(text, 
      "million|thousand|billion"),  setNames(c("1e3", "1e6", "1e9"), 
            c("thousand", "million", "billion"))))
 df %>%
    mutate(across(where(is.numeric), ~ scales::comma(.x * val_to_mult)))
    }

-testing

> text <- "in millions, except share and per share data"
> multiply_columns(df, text)
     col1           col2           col3
1                     <NA>           <NA>
2    assets  5,000,000,000  4,500,000,000
3      loss -1,400,000,000 -1,100,000,000
4 liability    300,000,000    500,000,000

> multiply_columns(df, "in thousands")
    col1       col2       col3
1                 <NA>       <NA>
2    assets  5,000,000  4,500,000
3      loss -1,400,000 -1,100,000
4 liability    300,000    500,000

> multiply_columns(df, "in billion")
    col1               col2               col3
1                         <NA>               <NA>
2    assets  5,000,000,000,000  4,500,000,000,000
3      loss -1,400,000,000,000 -1,100,000,000,000
4 liability    300,000,000,000    500,000,000,000
  • Related