Home > OS >  Extracting Numbers from a String to Divide Them in R
Extracting Numbers from a String to Divide Them in R


I have a data frame where the last column looks like so:

Signed 1-yr/$2.5M deal with Pacers  
Signed 4-yr/$113M deal with Celtics 
Signed 3-yr/$30M deal with Pacers   

These are all strings. I am trying to get the number before -yr and the number before the M. So, for the first row, I am trying to get 2.5 and 1.

I then want to divide like so 2.5/1.

so the column would look like so:


I tried str_extract_all(df$col,"\\d") but this only gets the numbers into a list. I do not know of a way to accomplish my goal

CodePudding user response:

Another solution using stringr package:

df <- data.frame(a = c("Signed 1-yr/$2.5M deal with Pacers",
                       "Signed 4-yr/$113M deal with Celtics",
                       "Signed 3-yr/$30M deal with Pacers"))


df |> 
  mutate(diff = as.numeric(str_remove(str_extract(a,"[0-9,.] M"),"M"))/as.numeric(str_remove(str_extract(a,"[0-9]-yr"),"-yr")))


                                    a  diff
1  Signed 1-yr/$2.5M deal with Pacers  2.50
2 Signed 4-yr/$113M deal with Celtics 28.25
3   Signed 3-yr/$30M deal with Pacers 10.00

CodePudding user response:

A possible solution:


df %>% 
  separate(V1, into = c("V2", "V3"), sep = "/", remove = F) %>% 
  mutate(result = parse_number(V3) / parse_number(V2)) %>% 
  select(V1, result)

#>                                     V1 result
#> 1 Signed 1-yr/$2.5M deal with Pacers     2.50
#> 2  Signed 4-yr/$113M deal with Celtics  28.25
#> 3 Signed 3-yr/$30M deal with Pacers     10.00

CodePudding user response:

df %>% 
  separate(col, c('length', 'value'), sep="/", remove=FALSE) %>% 
  mutate(length = str_extract(length,  "\\d "),
         value = str_extract(value, "[[:digit:]] \\.*[[:digit:]]*")) %>%
  mutate(value_by_year = as.numeric(value)/as.numeric(length))

# A tibble: 3 x 4
  col                                 length value value_by_year
  <chr>                               <chr>  <chr>         <dbl>
1 Signed 1-yr/$2.5M deal with Pacers  1      2.5             2.5
2 Signed 4-yr/$113M deal with Celtics 4      113            28.2
3 Signed 3-yr/$30M deal with Pacers   3      30             10  

CodePudding user response:

Tou could use extract() from tidyr:


df %>%
  extract(col, c("yr", "M"), "([\\d.] )\\D ([\\d.] )", remove = FALSE, convert = TRUE) %>%
  mutate(res = M / yr)

# # A tibble: 3 × 4
#   col                                    yr     M   res
#   <chr>                               <int> <dbl> <dbl>
# 1 Signed 1-yr/$2.5M deal with Pacers      1   2.5   2.5
# 2 Signed 4-yr/$113M deal with Celtics     4 113    28.2
# 3 Signed 3-yr/$30M deal with Pacers       3  30    10

Remember to set convert = TRUE to transform the component columns into numeric.

df <- tibble(col = c("Signed 1-yr/$2.5M deal with Pacers",  
"Signed 4-yr/$113M deal with Celtics", 
"Signed 3-yr/$30M deal with Pacers"))
  • Related