If I have this:
2 (5.7%)
34 (8.9%)
How can I just extract what is in between the first ( and % (just the percentage number)?
5.7
8.9
CodePudding user response:
We may use sub
to match the characters (.*
) till the opening braces (\\(
), capture the characters that are not %
([^%]
) as a group and replace with backreference of the captured group (\\1
)
as.numeric(sub(".*\\(([^%] ).*", "\\1", str1))
[1] 5.7 8.9
Or use str_extract
library(stringr)
as.numeric(str_extract(str1, "\\((.*)%", group = 1))
[1] 5.7 8.9
data
str1 <- c("2 (5.7%)", "34 (8.9%)")
CodePudding user response:
To add to @akrun's answer, I've come across this issue before and ended up 'splitting up' a mean (stdev)
or a count (proportion%)
using this approach:
library(tidyverse)
df <- data.frame(result = c("2 (5.7%)",
"34 (8.9%)"))
df
#> result
#> 1 2 (5.7%)
#> 2 34 (8.9%)
df %>%
mutate(result = str_remove_all(result, "\\(|\\%\\)")) %>%
separate(col = result, into = c("count", "proportion (%)"),
sep = " ", convert = TRUE)
#> count proportion (%)
#> 1 2 5.7
#> 2 34 8.9
This converts the columns into the 'correct' type:
str(df)
#> 'data.frame': 2 obs. of 2 variables:
#> $ count : int 2 34
#> $ proportion (%): num 5.7 8.9
CodePudding user response:
Just for fun:
library(readr)
library(stringr)
parse_number(str_replace(str1, paste(parse_number(str1), collapse = "|"), ''))
[1] 5.7 8.9