I have a data.frame df
with a character column text
that contains text. From that column, I would like to extract all percentage numbers (say, 1.2% and -2.3%) but not the ordinary numbers (say, 123 and 1.2) into a character vector.
A small example:
df <- data.frame(text = c("this text is 1.3% this is 1.4% and this -1.5%",
"this text is 123.3% this 123.3 and this 1234.5"))
Required output:
[1] "1.3%" "-1.4%" "-1.5%" "123.3%"
Is that possible?
CodePudding user response:
Probably not the most robust general-purpose solution, but works for your example:
unlist(stringr::str_extract_all(df$text, "[ \\-]?[0-9\\.] %"))
#[1] "1.3%" " 1.4%" "-1.5%" "123.3%"
## or using R's native forward pipe operator, since R 4.1.0
stringr::str_extract_all(df$text, "[ \\-]?[0-9\\.] %") |> unlist()
#[1] "1.3%" " 1.4%" "-1.5%" "123.3%"
This meets your expected output (i.e., a character vector). But in case you are thinking about storing the results to a new data frame column, you don't really want to unlist()
. Just do:
df$percentages <- stringr::str_extract_all(df$text, "[ \\-]?[0-9\\.] %")
df
# text percentages
#1 this text is 1.3% this is 1.4% and this -1.5% 1.3%, 1.4%, -1.5%
#2 this text is 123.3% this 123.3 and this 1234.5 123.3%
The new column percentages
itself is a list:
str(df$percentages)
#List of 2
# $ : chr [1:3] "1.3%" " 1.4%" "-1.5%"
# $ : chr "123.3%"
CodePudding user response:
Here is an alternative tidyverse
way:
First we extract the numbers with parse_number
from readr
package,and then within an ifelse
statement we specify the combination of number and percent. Finally pull
for vector output.
library(tidyverse)
df %>%
mutate(x = parse_number(text),
x = ifelse(str_detect(text, "%"), paste0(x,"%"), NA_character_)) %>%
pull(x)
1] "1.3%" "1.4%" "-1.5%" "123.3%" NA NA