First off - newbie with R so bear with me. I'm trying to recode string values as numeric. My problem is I have two different string patterns present in my values: "M" and "B" for 'million' and 'billion', respectively.
df <- (funds = c($1.76M, $2B, $57M, $9.87B)
I've successfully knocked off the dollar sign and now have:
df <- (funds = c($1.76M, $2B, $57M, $9.87B),
fundsR = c(1.76M, 2B, 57M, 9.87B)
)
How can I recode these as numeric while retaining their respective monetary values? I've tried using various if statements, for loops, with or without str_detect, pipe operators, case_when, mutate, etc. to isolate values with "M" and values with "B", convert to numeric and multiply to come up the complimentary numeric value--all in a new column. This seemingly simple task turned out not as simple as I imagined it would be and I'd attribute it to being a novice. At this point I'd like to start from scratch and see if anyone has any fresh ideas. My Rstudio is a MESS.
Something like this would be nice:
df <- (funds = c($1.76M, $2B, $57M, $9.87B),
fundsR = c(1.76M, 2B, 57M, 9.87B),
fundsFinal = c(1760000, 2000000000, 57000000, 9870000000)
)
I'd really appreciate your input.
CodePudding user response:
You could create a helper function f
, and then apply it to the funds
column:
library(dplyr)
library(stringr)
f <- function(x) {
curr = c("M"=1e6, "B" = 1e9)
val = str_remove(x,"\\$")
as.numeric(str_remove_all(val,"B|M"))*curr[str_extract(val, "B|M")]
}
df %>% mutate(fundsFinal = f(funds))
Output:
funds fundsFinal
1 $1.76M 1.76e 06
2 $2B 2.00e 09
3 $57M 5.70e 07
4 $9.87B 9.87e 09
Input:
df = structure(list(funds = c("$1.76M", "$2B", "$57M", "$9.87B")), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
This works but I'm sure better solutions exist. Assuming funds
is a character vector:
library(tidyverse)
options(scipen = 999)
df <- data.frame(funds = c('$1.76M', '$2B', '$57M', '$9.87B'))
df = df %>%
mutate( fundsFinal = ifelse(str_sub(funds,nchar(funds),-1) =='M',
as.numeric(substr(funds, 2, nchar(funds) - 1))*10^6,
as.numeric(substr(funds, 2, nchar(funds) - 1))*10^9))