I have a data frame where some values for "revenue" are listed in the hundreds, say "300," and others are listed as "1.5k." Obviously this is annoying, so I need to find some way of splitting the "k" and "." characters from those values and only those values. Any thoughts?
CodePudding user response:
You could create a function that remove "k", change to a numeric vector and multiple by 1,000.
to_1000 <- function(x){
x %>%
str_remove("k") %>%
as.numeric() %>%
{.*1000}
}
x <- c("3000","1.5k")
tibble(x) %>%
mutate(x_num = if_else(str_detect(x,"k"),to_1000(x),as.numeric(x)))
# A tibble: 2 x 2
x x_num
<chr> <dbl>
1 3000 3000
2 1.5k 1500
CodePudding user response:
Another way to do this is just with Regex (and tidyverse
for pipes)
library(tidyverse)
string <- c("300", "1.5k")
string %>% ifelse(
# check if string ends in k (upper/lower case)
grepl("[kK]$", .),
# if string ends in k, remove it and multiply by 1000
1000 * as.numeric(gsub("[kK]$", "", .)),
.) %>% as.numeric()
[1] 300 1500