I am looking for a regular expression that will help me extract terms in a formula that begin with a function and are within brackets.
For example, say I have the following formula:
formula <- formula(cured ~ dur(duration) age sex duranduran)
I can extract the individual terms:
attr(terms(formula), "term.labels")
which returns the vector
[1] "dur(duration)" "age" "sex" "duranduran"
I want to use grep
with some regex to give the index of any terms that are enclosed by dur()
. So far, I have tried
grep("^dur", attr(terms(formula), "term.labels"))
but this doesn't take into account the brackets. It returns 1 and 4, as the terms dur(duration)
and duranduran
both start with dur
. I am looking for a regular expression for:
Begins with dur(
, AND ends with )
.
CodePudding user response:
You can use
grep("^dur\\(.*\\)$", attr(terms(formula), "term.labels"))
Details:
^
- start of stringdur
-dur
substring\(
- a(
char.*
- any zero or more chars as many as possible\)
- a)
char$
- end of string.
See the regex demo.
CodePudding user response:
This does what you want with the package stringr and assuming that you want to extract terms that always have dur()
as the target function.
Let me know if you want to generalize it.
library(stringr)
formula <- formula(cured ~ dur(duration) age sex duranduran)
elements = attr(terms(formula), "term.labels")
idx = str_which(elements, "^dur\\(\\w \\)")
idx
#> [1] 1
Created on 2022-10-07 by the reprex package (v2.0.1)
CodePudding user response:
This does it without regular expressions. First get the labels and then using startsWith/endsWith we have the following. Omit which
if a logical vector is ok as the result.
labs <- labels(terms(formula))
which(startsWith(labs, "dur(") & endsWith(labs, ")"))
Note
The input equivalent and simplified from question:
formula <- cured ~ dur(duration) age sex duranduran