Home > other >  Extract formula terms using regex in R
Extract formula terms using regex in R

Time:10-08

I am looking for a regular expression that will help me extract terms in a formula that begin with a function and are within brackets.

For example, say I have the following formula:

formula <- formula(cured ~ dur(duration)   age   sex   duranduran)

I can extract the individual terms:

attr(terms(formula), "term.labels")

which returns the vector

[1] "dur(duration)" "age"          "sex"          "duranduran"

I want to use grep with some regex to give the index of any terms that are enclosed by dur(). So far, I have tried

grep("^dur", attr(terms(formula), "term.labels"))

but this doesn't take into account the brackets. It returns 1 and 4, as the terms dur(duration) and duranduran both start with dur. I am looking for a regular expression for:

Begins with dur(, AND ends with ).

CodePudding user response:

You can use

grep("^dur\\(.*\\)$", attr(terms(formula), "term.labels"))

Details:

  • ^ - start of string
  • dur - dur substring
  • \( - a ( char
  • .* - any zero or more chars as many as possible
  • \) - a ) char
  • $ - end of string.

See the regex demo.

CodePudding user response:

This does what you want with the package stringr and assuming that you want to extract terms that always have dur() as the target function.

Let me know if you want to generalize it.

library(stringr)
formula <- formula(cured ~ dur(duration)   age   sex   duranduran)

elements = attr(terms(formula), "term.labels")
idx = str_which(elements, "^dur\\(\\w \\)")
idx
#> [1] 1

Created on 2022-10-07 by the reprex package (v2.0.1)

CodePudding user response:

This does it without regular expressions. First get the labels and then using startsWith/endsWith we have the following. Omit which if a logical vector is ok as the result.

labs <- labels(terms(formula))
which(startsWith(labs, "dur(") & endsWith(labs, ")"))

Note

The input equivalent and simplified from question:

formula <- cured ~ dur(duration)   age   sex   duranduran
  • Related