Home > Back-end >  How can I fit a linear model inside a user defined function in R; error non-numeric argument when sp
How can I fit a linear model inside a user defined function in R; error non-numeric argument when sp

Time:06-04

I am trying to de-clutter some scripts by creating functions to complete repetitive tasks in R. One task I complete repeatedly is fitting a linear model to a set of data and creating predictions from that linear model fit. The data I am working with is concentration and flow data from streams, and flow is always the explanatory variable but the response variable changes and therefore I would like to include it as a function input. However, I receive a "non-numeric argument to mathematical function" error when I run the function. I have tried both with and without quotes since the lm() call does not require quotes but that results in the classis "object 'myobject' not found". Here's a simple example.

Update

flows <- seq(0,7,0.01) 
  dat <- tibble(flow=sample(flows,30),
                parameter1_conc=rnorm(30,15,4),
                parameter2_conc=rnorm(30,50,8))
  
  regr_func <- function(modeldata,parameter,pred_maxflow,pred_flowint) {
    mod <- lm(as.formula(paste('log(', parameter, ') ~ log(', flow, ')')), data=modeldata)
    
    newflow <- data.frame(flow = seq(0, pred_maxflow, pred_flowint))

    preds <<- predict(mod, newdata = newflow,
                       interval = 'prediction')
  }
regr_func(modeldata = dat,
          parameter = 'parameter1_conc',
          pred_maxflow = 20,
          pred_flowint = 0.001)

Original Example Error


flows <- seq(0,7,0.01) 
  dat <- tibble(flow=sample(flows,30),
                parameter1_conc=rnorm(30,15,4),
                parameter1_conc=rnorm(30,50,8))
  
  regr_func <- function(modeldata,parameter,pred_maxflow,pred_flowint) {
    mod <- lm(log(parameter)~log(flow), data = modeldata)
    
    newflow <- data.frame(flow = seq(0, maxflow, flowint))

    preds <<- predict(mod, newdata = newflow,
                       interval = 'prediction')
  }
regr_func(modeldata = dat,
          parameter = 'parameter1_conc',
          pred_maxflow = 20,
          pred_flowint = 0.001)

CodePudding user response:

There are 3 issues here. The main one is that log(parameter) in your lm formula does not get substituted for the variable passed in as parameter. That means lm is literally looking for a column called parameter in your data, which doesn't exist. You can fix this by creating a formula with the name substituted in. Although doing this with strings is the most commonly used method to do this, it is a bit more efficient and safer to use substitute. This also allows you to pass your column name without quotes.

The second issue is that the arguments maxflow and flowint should probably be pred_maxflow and pred_flowint to match your function parameters.

Thirdly, using the <<- operator to write to a variable in the calling frame is bad practice. R users expect functions not to have such side effects, and know to store the output of function calls to variables under their control. Only in very rare circumstances should this be done within the function.

Putting all this together, we have:

regr_func <- function(modeldata, parameter, pred_maxflow, pred_flowint) {
    
    f <-  `[[<-`(x ~ log(flow), 2, substitute(log(parameter)))

    mod <- lm(f, data = modeldata)
    
    newflow <- data.frame(flow = seq(0, pred_maxflow, pred_flowint))

    predict(mod, newdata = newflow, interval = 'prediction')
  }

And we would call the function like this:

preds <- regr_func(modeldata = dat,
          parameter = parameter1_conc,
          pred_maxflow = 20,
          pred_flowint = 0.001)

resulting in:

head(preds)
#>        fit      lwr      upr
#> 1      Inf      NaN      NaN
#> 2 3.365491 2.188942 4.542041
#> 3 3.312636 2.219223 4.406049
#> 4 3.281717 2.236294 4.327140
#> 5 3.259780 2.248073 4.271488
#> 6 3.242765 2.256998 4.228531

Created on 2022-06-03 by the reprex package (v2.0.1)

  • Related