I have the following dataframe. Based on a condition -- millions or thousands present in a text -- I need to multiply the numeric column with million or thousand, and then put a thousand separator:
df <- data.frame(col1=c('', 'assets', 'loss', 'liability'),
col2=c(NA, 5000, -1400, 300),
col3=c(NA, 4500, -1100, 500))
df
col1 col2 col3
<chr> <dbl> <dbl>
NA NA
assets 5000 4500
loss -1400 -1100
liability 300 500
I am trying the following script to see which condition -- million or thousand -- is present, multiply with that, and then put a thousand separator:
multiply_columns <- function(df){
text <- "in millions, except share and per share data"
# Iterate over the columns
for (idx in 2:length(names(df))) {
# Check if text contains million
if (grepl(text, "in millions", fixed = TRUE)) {
# If yes, multiply column values with million else thousand
df[, idx] <- format(df[, idx]*1000000, big.mark=",", scientific=FALSE)
} else
df[, idx] <- format(df[, idx]*1000, big.mark=",", scientific=FALSE)
}
return(df)
}
It seems the function isn't multiplying correctly. It's multiplying with 1000 whereas the text contains millions.
multiply_columns(df)
col1 col2 col3
<chr> <chr> <chr>
NA NA
assets 5,000,000 4,500,000
loss -1,400,000 -1,100,000
liability 300,000 500,000
Desired Output
col1 col2 col3
<chr> <dbl> <dbl>
NA NA
assets 5,000,000,000 4,500,000,000
loss -1,400,000,000 -1,100,000,000
liability 300,000,000 500,000,000
Any suggestions would be appreciated. Thanks!
CodePudding user response:
I'm more familiar with tidyverse functions, but here is how I would structure your function:
library(tidyverse)
conv_th_or_mm <- function(df, text){
if(str_detect(text, regex("million", ignore_case=T))){
df <- df %>%
mutate(
across(
.cols = where(is.numeric),
.fns = ~.x*1000000
)
)
} else {
df <- df %>%
mutate(
across(
.cols = where(is.numeric),
.fns = ~.x*1000
)
)
}
df <- df %>%
mutate(
across(
.cols = where(is.numeric),
.fns = ~formatC(.x, big.mark=",", digits=0, format="f")
)
)
return(df)
}
A few points:
- I added
text
in as an argument of the function, which I think you will want to pass in depending on the dataframe. - I've only used formatC() before, so I've used equivalent parameters to match your format() function.
- I gave your function a slightly different name, which maybe isn't important now but could be if you next look at it a year from now and there are two dozen other functions you've written.
- If your use case calls for it, you could be more specific with .cols but I assumed all the numeric values needed to be converted.
Editing to add that R is very flexible for where it goes to look for variables you call. It is possible that R could go look for text
in the global environment if you call it within your function despite not passing it in as an argument (which is strange behavior for some other languages). However, if the data outside of your function is my_df and your text you're evaluating is my_text, you would use my function like this:
my_df <- conv_th_or_mm(df = my_df, text = my_text)
CodePudding user response:
We may extract the thousand
, million
and replace with 1e3
and 1e6
to multiply. Assuming that the text
can be input to the function
library(dplyr)
multiply_columns <- function(df, text){
val_to_mult <- as.numeric(stringr::str_replace_all(stringr::str_extract(text,
"million|thousand|billion"), setNames(c("1e3", "1e6", "1e9"),
c("thousand", "million", "billion"))))
df %>%
mutate(across(where(is.numeric), ~ scales::comma(.x * val_to_mult)))
}
-testing
> text <- "in millions, except share and per share data"
> multiply_columns(df, text)
col1 col2 col3
1 <NA> <NA>
2 assets 5,000,000,000 4,500,000,000
3 loss -1,400,000,000 -1,100,000,000
4 liability 300,000,000 500,000,000
> multiply_columns(df, "in thousands")
col1 col2 col3
1 <NA> <NA>
2 assets 5,000,000 4,500,000
3 loss -1,400,000 -1,100,000
4 liability 300,000 500,000
> multiply_columns(df, "in billion")
col1 col2 col3
1 <NA> <NA>
2 assets 5,000,000,000,000 4,500,000,000,000
3 loss -1,400,000,000,000 -1,100,000,000,000
4 liability 300,000,000,000 500,000,000,000