Extract a value between 2 different characters in R-CodePudding

If I have this:

2 (5.7%)
34 (8.9%)

How can I just extract what is in between the first ( and % (just the percentage number)?

5.7
8.9

CodePudding user response：

We may use sub to match the characters (.*) till the opening braces (\\(), capture the characters that are not % ([^%] ) as a group and replace with backreference of the captured group (\\1)

as.numeric(sub(".*\\(([^%] ).*", "\\1", str1))
[1] 5.7 8.9

Or use str_extract

library(stringr)
as.numeric(str_extract(str1, "\\((.*)%", group = 1))
[1] 5.7 8.9

data

 str1 <- c("2 (5.7%)", "34 (8.9%)")

CodePudding user response：

To add to @akrun's answer, I've come across this issue before and ended up 'splitting up' a mean (stdev) or a count (proportion%) using this approach:

library(tidyverse)
df <- data.frame(result = c("2 (5.7%)",
                            "34 (8.9%)"))

df
#>      result
#> 1  2 (5.7%)
#> 2 34 (8.9%)

df %>%
  mutate(result = str_remove_all(result, "\\(|\\%\\)")) %>%
  separate(col = result, into = c("count", "proportion (%)"),
           sep = " ", convert = TRUE)
#>   count proportion (%)
#> 1     2            5.7
#> 2    34            8.9

This converts the columns into the 'correct' type:

str(df)
#> 'data.frame':    2 obs. of  2 variables:
#>  $ count         : int  2 34
#>  $ proportion (%): num  5.7 8.9

CodePudding user response：

Just for fun:

library(readr)
library(stringr)
parse_number(str_replace(str1, paste(parse_number(str1), collapse = "|"), ''))

[1] 5.7 8.9