Home > Software engineering >  Recode monetary string values into new variable as numeric
Recode monetary string values into new variable as numeric

Time:07-02

First off - newbie with R so bear with me. I'm trying to recode string values as numeric. My problem is I have two different string patterns present in my values: "M" and "B" for 'million' and 'billion', respectively.

df <- (funds = c($1.76M, $2B, $57M, $9.87B)

I've successfully knocked off the dollar sign and now have:

df <- (funds = c($1.76M, $2B, $57M, $9.87B),
       fundsR = c(1.76M, 2B, 57M, 9.87B)
       )

How can I recode these as numeric while retaining their respective monetary values? I've tried using various if statements, for loops, with or without str_detect, pipe operators, case_when, mutate, etc. to isolate values with "M" and values with "B", convert to numeric and multiply to come up the complimentary numeric value--all in a new column. This seemingly simple task turned out not as simple as I imagined it would be and I'd attribute it to being a novice. At this point I'd like to start from scratch and see if anyone has any fresh ideas. My Rstudio is a MESS.

Something like this would be nice:

df <- (funds = c($1.76M, $2B, $57M, $9.87B),
       fundsR = c(1.76M, 2B, 57M, 9.87B),
       fundsFinal = c(1760000, 2000000000, 57000000, 9870000000)
       )

I'd really appreciate your input.

CodePudding user response:

You could create a helper function f, and then apply it to the funds column:


library(dplyr)
library(stringr)

f <- function(x) {
  curr = c("M"=1e6, "B" = 1e9)
  val = str_remove(x,"\\$")
  as.numeric(str_remove_all(val,"B|M"))*curr[str_extract(val, "B|M")]
}

df %>% mutate(fundsFinal = f(funds))

Output:

   funds fundsFinal
1 $1.76M   1.76e 06
2    $2B   2.00e 09
3   $57M   5.70e 07
4 $9.87B   9.87e 09

Input:

df = structure(list(funds = c("$1.76M", "$2B", "$57M", "$9.87B")), class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response:

This works but I'm sure better solutions exist. Assuming funds is a character vector:

library(tidyverse)
options(scipen = 999)
df <- data.frame(funds = c('$1.76M', '$2B', '$57M', '$9.87B'))


df = df %>%
  mutate( fundsFinal = ifelse(str_sub(funds,nchar(funds),-1) =='M',
                          as.numeric(substr(funds, 2, nchar(funds) - 1))*10^6,
                          as.numeric(substr(funds, 2, nchar(funds) - 1))*10^9))
  • Related