R: instrument function to capture all assignments-CodePudding

Given a regular R function f, I'd like to be able to create a new function f_debug that acts just like f, but lets me keep track of all the assignments to function-local variables that happened inside it.

For example:

f <- function(x, y) {
  z <- x   y
  df <- data.frame(z=z)
  df
}

# This function doesn't work as intended - would like it to (in the case of `f` above)
# write out a list containing `z` and `df` to an RDS file
capturing <- function(func) {
  e <- new.env()
  altered <- function(...) {
    parent <- parent.frame()
    e <- something...(func, environment(), parent, etc., etc.)
    result <- func(...)
    saveRDS(as.list(e), 'foo.rds')
    result
  }
  environment(func) <- e
  altered
}

f_debug <- capturing(f)

I'm not sure whether my knowledge gap to do this is large or small, anyone have a solution?

CodePudding user response：

Solution 1: Steal the function's code

Here's a solution which doesn't return a new function which captures intermediate calculations, but rather calls the given function's code internally. There's some limitations, such as it probably only works with named arguments. Instead of storing the intermediate calculations as an RDS, it attaches them as an attribute.

capturing <- function(fun, ...) { 
  fun <- match.fun(fun)
  code <- body(fun)
  parent <- environment(fun)
  env <- new.env(parent = parent)
  for (val in names(list(...))) {
    env[[val]] <- list(...)[[val]]
  }
  result <- eval(code, envir = env, enclos = parent.frame())
  attr(result, "intermediate") <- env
  result
}

my_add <- function(x, y) {
  z <- x y
  u <- x-y
  w <- x*y
  x   y
}

intermediates <- function(x) {
  attr(x, "intermediate", exact = TRUE)
}

value <- capturing(my_add, x = 1, y = 7)
ls(envir = intermediates(value))
#> [1] "u" "w" "x" "y" "z"
intermediates(value)$x
#> [1] 1
# Created on 2022-02-08 by the reprex package (v2.0.1)

Solution 2: Modify the function's code

One weakness of this solution is that if the chosen function features a call to on.exit(add=FALSE), some additional work needs to be done to modify the function so the internal environment is captured. However, it does work when the function accepts ... arguments.

my_add <- function(x, y) {
  z <- x y
  u <- x-y
  w <- x*y
  x   y
}

insert_capture <- function(code) {
  # `<<-` assigns into the global environment if no variable of the given name is found
  # while traveling up to the global environment. If you need this assignment to go elsewhere,
  # I'd recommend passing in `assign()`. Of course, you could also modify the `on.exit()`
  # to use saveRDS.
  parse(text=append(deparse(code), 
                            "on.exit(._last_capture <<- environment(), add = TRUE)",
                            after = 1L))
}
capturing2 <- function(fun) {
  fun <- match.fun(fun)
  code <- insert_capture(body(fun))
  body(fun) <- code
  fun 
}

my_add2 <- capturing2(my_add)

my_add2(1, 7)
#> [1] 8
ls(envir = ._last_capture)
#> [1] "u" "w" "x" "y" "z"
._last_capture$u
#> [1] -6

^{Created on 2022-02-08 by the reprex package (v2.0.1)}

CodePudding user response：

What you are describing is already implemented in base R with utils::dump.frames, in an even more sophisticated way. It saves the frame (environment) associated with each call in the call stack to an object of class "dump.frames", which you can explore retroactively with utils::debugger as if you had actually run your code under a debugger.

capturing <- function(func, ...) {
    cc <- as.call(c(quote(utils::dump.frames), list(...)))
    cc <- call("on.exit", cc, add = TRUE)
    body(func) <- call("{", cc, body(func))
    func
}

capturing injects the call on.exit(utils::dump.frames(...), add = TRUE) into the body of func and returns the modified function. Here, ... is a list of arguments to dump.frames:

dumpto, a character string giving the name to be used for the "dump.frames" object
to.file, a logical flag indicating whether the "dump.frames" object should be assigned in the global environment or save-ed to paste0(dumpto, ".rda") in the current working directory
include.GlobalEnv, a logical flag indicating whether the global environment should be saved as well

A quick example, which you should try yourself:

tmp <- tempfile()
dir.create(tmp)
cwd <- setwd(tmp)

f <- function(x, y) {
    z <- x   y
    z   1
}
g <- capturing(f, dumpto = "zzz", to.file = TRUE)
h <- function(a, b) {
    d <- g(a, b)
    d   1
}
h12 <- h(1, 2)

load("zzz.rda")
zzz
## $`h(1, 2)`
## <environment: 0x14c16cb58>
## 
## $`#2: g(a, b)`
## <environment: 0x14c16ca40>
## 
## attr(,"error.message")
## [1] ""
## attr(,"class")
## [1] "dump.frames"

ls(zzz[[1L]])
## [1] "a" "b"

ls(zzz[[2L]])
## [1] "z" "x" "y"

utils::debugger(zzz)
## Message:  Available environments had calls:
## 1: h(1, 2)
## 2: #2: g(a, b)
## 
## Enter an environment number, or 0 to exit  
## Selection: 2
## Browsing in the environment with call:
##    #2: g(a, b)
## Called from: debugger.look(ind)
## Browse[1]> ls()
## [1] "x" "y" "z"
## Browse[1]> x == 1 && y == 2 && z == x   y
## [1] TRUE
## Browse[1]> Q

setwd(cwd)
unlink(tmp, recursive = TRUE)

See ?browser if you are unfamiliar with R's environment browser.

My capturing function has the limitation that on.exit calls in the body of func must also use add = TRUE. If you have written func yourself, then it is not much of a limitation at all, and passing add = TRUE is a good habit anyway.

Ultimately, there is no completely safe way to inject code into functions, but, in an interactive setting, I would say that this level of "unsafety" is fine.