Complicated string split


I have a data frame where some values for "revenue" are listed in the hundreds, say "300," and others are listed as "1.5k." Obviously this is annoying, so I need to find some way of splitting the "k" and "." characters from those values and only those values. Any thoughts?

CodePudding user response:

You could create a function that remove "k", change to a numeric vector and multiple by 1,000.

to_1000 <- function(x){
    x %>% 
      str_remove("k") %>% 
      as.numeric() %>% 

x <- c("3000","1.5k")

tibble(x) %>% 
  mutate(x_num = if_else(str_detect(x,"k"),to_1000(x),as.numeric(x)))

# A tibble: 2 x 2
  x     x_num
  <chr> <dbl>
1 3000   3000
2 1.5k   1500

CodePudding user response:

Another way to do this is just with Regex (and tidyverse for pipes)


 string <- c("300", "1.5k")

 string %>% ifelse(
        # check if string ends in k (upper/lower case)
        grepl("[kK]$", .), 
        # if string ends in k, remove it and multiply by 1000
        1000 * as.numeric(gsub("[kK]$", "", .)), 
        .) %>% as.numeric() 

 [1]  300 1500
