I've tried replace character which match with specific character or followed by "BT", but my codes failed. This is my codes:
df <- data.frame(
exposure = c("123BT", "113BB", "116BB", "117BT")
)
df %>%
mutate(
exposure2 = case_when(exposure == regmatches("d \\BT") ~ paste0("-", exposure),
TRUE ~ exposure)
)
the error is:
Error: Problem with `mutate()` column `exposure2`.
i `exposure2 = case_when(...)`.
x argument "m" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
Whereas my target is:
df <- data.frame(
exposure = c("123BT", "113BB", "116BB", "117BT"),
exposure2 = c(-123, 113, 116, -117)
)
CodePudding user response:
I recommend you use library stringr
, you can extract your numbers with regex (\\d)
:
library(stringr)
library(dplyr)
df %>%
mutate(
exposure2 = case_when(str_detect(exposure,"BT") ~ paste0("-", str_extract(exposure, "(\\d) ")),
TRUE ~ str_extract(exposure, "(\\d) "))
)
Output:
exposure exposure2
1 123BT -123
2 113BB 113
3 116BB 116
4 117BT -117
If you still prefer use regmatches
you can get same result with:
df %>%
mutate(
exposure2 = case_when(exposure %in% regmatches(exposure, regexpr("\\d BT", exposure)) ~ paste0("-", regmatches(exposure, regexpr("\\d ", exposure))),
TRUE ~ regmatches(exposure, regexpr("\\d ", exposure)))
)
CodePudding user response:
First, a concise solution that you can easily implement in your dplyr::mutate
. Using gsub
we remove characters and coerce the result as.integer
. The result, we multiply by 1
or -1
according to if the string contains "BT"
or not; for this we use grepl
(gives boolean) and add 1L
(coerces to integer) to get indices 1
or 2
.
c(1, -1)[grepl('BT', df$exposure) 1L]*as.integer(gsub('\\D', '', df$exposure))
# [1] -123 113 116 -117
Above is the recommended solution. The solution you envision is much more complex since it processes the information not very efficient. I implement the logic in a small f
unction1 to demonstrate.
f <- \(x) {
rm <- regmatches(x, regexpr("\\d BT", x))
o <- gsub('\\D', '', x)
o <- ifelse(x %in% rm, paste0('-', o), o)
as.integer(o)
}
f(df$exposure)
# [1] -123 113 116 -117
1Notes: For regmatches
you need matching info, e.g. from regexpr
. The regex should actually look sth like "\\d BT"
.
Data:
df <- structure(list(exposure = c("123BT", "113BB", "116BB", "117BT"
)), class = "data.frame", row.names = c(NA, -4L))
CodePudding user response:
library(readr)
(-1)^grepl('BT', df$exposure) * parse_number(df$exposure)
[1] -123 113 116 -117