In the below data frame, I am trying to build a new column "avcomp" which extracts the numeric value between the whitespace and the % sign from the column "code", if the duty_nature is "M" or "C". I tried the below code but it only manages to get the first two numbers and the percent sign. Can you please help?
The avcomp column should read
>[1] "30", "0.50, ""
Thank you!
code <- c("Greater than 30% of something","Less than 0.50% of something","30%")
duty_nature<- c("M","C","A")
test <-data.frame(code,duty_nature)
test$avcomp <- ifelse(test$duty_nature == "M" | test$duty_nature == "C",str_sub(str_match(test$code,"\\s*(.*?)%\\s*"),-4,-1),"")
CodePudding user response:
The regex pattern [0-9] [.]?[0-9]*(?=%)
matches any number with a decimal point between followed by a percentage sign (look ahead):
library(tidyverse)
code <- c("Greater than 30% of something", "Less than 0.50% of something", "30%")
duty_nature <- c("M", "C", "A")
test <- data.frame(code, duty_nature)
test %>%
mutate(
avcomp = ifelse(
duty_nature %in% c("M", "C"),
code %>% str_extract("[0-9] [.]?[0-9]*(?=%)") %>% as.numeric(),
NA
)
)
#> code duty_nature avcomp
#> 1 Greater than 30% of something M 30.0
#> 2 Less than 0.50% of something C 0.5
#> 3 30% A NA
Created on 2022-03-21 by the reprex package (v2.0.0)
CodePudding user response:
Create a logical index with %in%
and change the new column where the index is TRUE
.
code <- c("Greater than 30% of something","Less than 0.50% of something","30%")
duty_nature<- c("M","C","A")
test <-data.frame(code,duty_nature)
test$avcomp <- ""
i <- test$duty_nature %in% c("M", "C")
test$avcomp[i] <- stringr::str_match(test$code, "(\\d \\.*\\d*)%")[i, 2]
test
#> code duty_nature avcomp
#> 1 Greater than 30% of something M 30
#> 2 Less than 0.50% of something C 0.50
#> 3 30% A
Created on 2022-03-21 by the reprex package (v2.0.1)
tidyverse
solution.
suppressPackageStartupMessages(library(tidyverse))
code <- c("Greater than 30% of something","Less than 0.50% of something","30%")
duty_nature<- c("M","C","A")
test <-data.frame(code,duty_nature)
test %>%
mutate(
avcomp = if_else(duty_nature %in% c("M", "C"), str_match(test$code, "(\\d \\.*\\d*)%")[, 2], "")
)
#> code duty_nature avcomp
#> 1 Greater than 30% of something M 30
#> 2 Less than 0.50% of something C 0.50
#> 3 30% A
Created on 2022-03-21 by the reprex package (v2.0.1)