Home > database >  Combination Regmatches and Replacement for specific Character
Combination Regmatches and Replacement for specific Character


I've tried replace character which match with specific character or followed by "BT", but my codes failed. This is my codes:

df <- data.frame(
  exposure = c("123BT", "113BB", "116BB", "117BT")

df %>%
    exposure2 = case_when(exposure == regmatches("d \\BT") ~ paste0("-", exposure),
                     TRUE ~ exposure)

the error is:

Error: Problem with `mutate()` column `exposure2`.
i `exposure2 = case_when(...)`.
x argument "m" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.

Whereas my target is:

df <- data.frame(
  exposure = c("123BT", "113BB", "116BB", "117BT"),
exposure2 = c(-123, 113, 116, -117)

CodePudding user response:

I recommend you use library stringr, you can extract your numbers with regex (\\d) :


df %>%
    exposure2 = case_when(str_detect(exposure,"BT") ~ paste0("-", str_extract(exposure, "(\\d) ")),
                          TRUE ~ str_extract(exposure, "(\\d) "))


  exposure exposure2
1    123BT      -123
2    113BB       113
3    116BB       116
4    117BT      -117

If you still prefer use regmatches you can get same result with:

df %>%
    exposure2 = case_when(exposure %in% regmatches(exposure, regexpr("\\d BT", exposure)) ~ paste0("-", regmatches(exposure, regexpr("\\d ", exposure))),
                          TRUE ~ regmatches(exposure, regexpr("\\d ", exposure)))

CodePudding user response:

First, a concise solution that you can easily implement in your dplyr::mutate. Using gsub we remove characters and coerce the result as.integer. The result, we multiply by 1 or -1 according to if the string contains "BT" or not; for this we use grepl (gives boolean) and add 1L (coerces to integer) to get indices 1 or 2.

c(1, -1)[grepl('BT', df$exposure)   1L]*as.integer(gsub('\\D', '', df$exposure))
# [1] -123  113  116 -117

Above is the recommended solution. The solution you envision is much more complex since it processes the information not very efficient. I implement the logic in a small function1 to demonstrate.

f <- \(x) {
  rm <- regmatches(x, regexpr("\\d BT", x))
  o <- gsub('\\D', '', x)
  o <- ifelse(x %in% rm, paste0('-', o), o)

# [1] -123  113  116 -117

1Notes: For regmatches you need matching info, e.g. from regexpr. The regex should actually look sth like "\\d BT".


df <- structure(list(exposure = c("123BT", "113BB", "116BB", "117BT"
)), class = "data.frame", row.names = c(NA, -4L))

CodePudding user response:

(-1)^grepl('BT', df$exposure)  * parse_number(df$exposure)
[1] -123  113  116 -117
  • Related