Replace expressions in a source file from another source file in R-CodePudding

Hello I have the following problem

Say I have a file base.R

x <- 1
# comment
y <- Y ~ X1  
         X2
# comment 2
z <- function(x) {
  x   1
}
t <- z(x)

and another file override.R

x <- 2
y <- Y ~ X1   X3

my goal would be to create another file new.R which is essentially base.R overriden by override.R

x <- 2
# comment
y <- Y ~ X1   X3
# comment 2
z <- function(x) {
  x   1
}
t <- z(x)

Obviously if all expressions in base.R were 1 liners I would be able to use sed but unfortunately it's not the case. Note that I only need it to work for assignations lhs <- rhs either if ideally lhs = rhs would work as well.

EDIT: the above is a minimization of my actual problem

CodePudding user response：

If you can accept comments being stripped, then this might suffice for you:

Starting with base.R:

x <- 1
# comment
y <- Y ~ X1  
         X2
# comment 2
z <- function(x) {
  x   1
}
t <- z(x)

and override.R:

x <- 2
y <- Y ~ X1   X3

We can run:

base <- parse("base.R")
override <- parse("override.R")

base_assignment <-
  sapply(base, function(z) as.character(z[[1]]) %in% c("<-", "="))
base_lhs <- mapply(function(assigned, z) as.character(z[[2]]),
                   base_assignment, base)

override_assignment <-
  sapply(override, function(z) as.character(z[[1]]) %in% c("<-", "="))
override_lhs <- mapply(function(assigned, z) as.character(z[[2]]),
                       override_assignment, override)

matches <- match(base_lhs, override_lhs)
base[which(!is.na(matches))] <- override[na.omit(matches)]

writeLines(paste(do.call(c, lapply(base, deparse)), collapse = "\n"), "new.R")

and now we have new.R with

x <- 2
y <- Y ~ X1   X3
z <- function(x) {
    x   1
}
t <- z(x)

For conversation, in order to retain comments we'd likely need to use getParseData:

iterate over $parent and $id so that our $line1 references can be combined, store this reduced line1 into a new variable (since we'll need to remove the originals from getParseData(base);
find all references to $token == "SYMBOL" where there exists $token == "LEFT_ASSIGN" later in each expression. This starts to hobble it a little in the instance we have "EQ_ASSIGN" or, more of a challege, "RIGHT_ASSIGN" (since the presumed order of symbols changes);
step 2 helps us find object names to which assignments occur, which we use to compare between base/override processing;
replace the subset of each versions' parsed frame;
find a way to recombine the resulting parsed frame into a source file.

I ran out of time trying to get this to work elegantly/robustly, so I offer it as an example of effort-required in order to retain comments.

I suggest that if your intent is to allow a single source file of overriding expressions, it makes sense to keep the base.R untouched (as in your question) and create a temporary new.R that is used and sourced and discarded, in which case its comments are tangential.

CodePudding user response：

Determine the number of statements n in override.R. Then parse base.R and find the last line number prior to the first line not to be overridden, ix. Then in the lines ending in that line number find the last non-comment line number, mx. Now write out override.R followed by all but the first mx lines of base.R . In the code below replace stdout() with the desired name of the output file, e.g. "outfile.R" .

library(utils)

n <- length(parse("override.R"))
g <- getParseData(parse("base.R"))
ix <- g$line1[grep("^0", g$parent)][n   1] - 1

baseLines <- readLines("base.R")
is_comment <- grepl("^\\s*#", head(baseLines, ix))
mx <- max(which(!is_comment))

overrideLines <- readLines("override.R")
writeLines(c(overrideLines, tail(baseLines, -mx)), stdout())

giving:

x <- 2
y <- Y ~ X1   X3
# comment 2
z <- function(x) {
  x   1
}
t <- z(x)

Alternative

If you control base.R then a simpler approach is to mark the end of the portion to be overriden. Suppose we put #--- on a line by itself in base.R between the portion to override and the rest. Then we have the following which is simpler:

overrideLines <- readLines("override.R")
baseLines <- readLines("base.R")
ix <- grep("#---", baseLines)[1]
writeLines(c(overrideLines, tail(baseLines, -ix)), stdout())

or possibly, in base.R, check if x has already been defined and only define it if not. Ditto for y. Then it is just a matter of concatenating the two files or sourcing one after the other.

if (!exists("x")) x <- ...whatever...
if (!exists("y")) y <- ...whatever...

Yet another possibility is to define a function whose defaults are the current values of x and y in base.R. Then we can call it as f() to get the defaults or specify them.

f <- function(x = ..., y = ...) {  ...base.R code except  x and y ...}