Home > front end >  In R, Extract number value before % sign in a dataframe column of strings
In R, Extract number value before % sign in a dataframe column of strings

Time:03-21

In the below data frame, I am trying to build a new column "avcomp" which extracts the numeric value between the whitespace and the % sign from the column "code", if the duty_nature is "M" or "C". I tried the below code but it only manages to get the first two numbers and the percent sign. Can you please help?

The avcomp column should read

>[1] "30", "0.50, ""

Thank you!

code <- c("Greater than 30% of something","Less than 0.50% of something","30%")
duty_nature<- c("M","C","A")
test <-data.frame(code,duty_nature)
  
test$avcomp <- ifelse(test$duty_nature == "M" | test$duty_nature == "C",str_sub(str_match(test$code,"\\s*(.*?)%\\s*"),-4,-1),"")

CodePudding user response:

The regex pattern [0-9] [.]?[0-9]*(?=%) matches any number with a decimal point between followed by a percentage sign (look ahead):

library(tidyverse)
code <- c("Greater than 30% of something", "Less than 0.50% of something", "30%")
duty_nature <- c("M", "C", "A")
test <- data.frame(code, duty_nature)

test %>%
  mutate(
    avcomp = ifelse(
      duty_nature %in% c("M", "C"),
      code %>% str_extract("[0-9] [.]?[0-9]*(?=%)") %>% as.numeric(),
      NA
    )
  )
#>                            code duty_nature avcomp
#> 1 Greater than 30% of something           M   30.0
#> 2  Less than 0.50% of something           C    0.5
#> 3                           30%           A     NA

Created on 2022-03-21 by the reprex package (v2.0.0)

CodePudding user response:

Create a logical index with %in% and change the new column where the index is TRUE.

code <- c("Greater than 30% of something","Less than 0.50% of something","30%")
duty_nature<- c("M","C","A")
test <-data.frame(code,duty_nature)

test$avcomp <- ""
i <- test$duty_nature %in% c("M", "C")
test$avcomp[i] <- stringr::str_match(test$code, "(\\d \\.*\\d*)%")[i, 2]
test
#>                            code duty_nature avcomp
#> 1 Greater than 30% of something           M     30
#> 2  Less than 0.50% of something           C   0.50
#> 3                           30%           A

Created on 2022-03-21 by the reprex package (v2.0.1)


tidyverse solution.

suppressPackageStartupMessages(library(tidyverse))

code <- c("Greater than 30% of something","Less than 0.50% of something","30%")
duty_nature<- c("M","C","A")
test <-data.frame(code,duty_nature)

test %>%
  mutate(
    avcomp = if_else(duty_nature %in% c("M", "C"), str_match(test$code, "(\\d \\.*\\d*)%")[, 2], "")
  )
#>                            code duty_nature avcomp
#> 1 Greater than 30% of something           M     30
#> 2  Less than 0.50% of something           C   0.50
#> 3                           30%           A

Created on 2022-03-21 by the reprex package (v2.0.1)

  • Related