Error when using nls for positive coefficient constraint-CodePudding

I'm trying to run a regression with a constraint to set all coefficients greater than zero. To do this, I am utilizing the nls function. However, I am having an error:

"Error in nls(formula = y ~ . - 1, data = X, start = low, lower = low, : parameters without starting value in 'data': ."

I believe everything is correct here, I tried to set a lower and upper bound on all variables, so I am not sure what is wrong.

Attempt 1:

library(magrittr)
X <- data.frame(
    x1 = seq(10),
    x2 = seq(10),
    x3 = seq(10),
    x4 = seq(10),
    x5 = seq(10),
    y = seq(10)
)

low <- dplyr::select(X, -y) %>% names %>% lapply( function(e) 0)
up <-  dplyr::select(X, -y) %>% names %>% lapply( function(e) Inf)
names(low) <- dplyr::select(X, -y) %>% names -> names(up)


fit1 <- nls(formula = y ~ . -1 , data = X,
    start = low,
    lower = low,
    upper = up,
    algorithm = "port"
)

Attempt 2:
Here I try to set the formula manually but then I get a new error:
"Error in qr(.swts * gr) : dims [product 5] do not match the length of object [10]"

library(magrittr)
X <- data.frame(
    x1 = seq(10),
    x2 = seq(10),
    x3 = seq(10),
    x4 = seq(10),
    x5 = seq(10),
    y = seq(10)
)

n <- X %>% dplyr::select( -y ) %>% names %>% paste0( collapse = "   " )
f <- "y ~ %s -1" %>% sprintf( n ) %>% as.formula

low <- dplyr::select(X, -y) %>% names %>% lapply( function(e) 0)
up <-  dplyr::select(X, -y) %>% names %>% lapply( function(e) Inf)
names(low) <- dplyr::select(X, -y) %>% names -> names(up)


fit1 <- nls(formula = f , data = X,
    start = low,
    lower = low,
    upper = up,
    algorithm = "port"
)

How can I fix this? Thanks!

CodePudding user response：

1) There are several problems here:

nls does not use the same formula notation as lm. Have fixed below.
the example does not have identifiable parameters, i.e. they are not unique so the calculation will fail. Below we change the example.
although 0 starting values seem to work here in general numeric optimization with constraints tends to work better if the starting values are in the interior of the feasible region.

Using the above we have

set.seed(123)
X <- data.frame(
    x1 = rnorm(10),
    x2 = rnorm(10),
    x3 = rnorm(10),
    x4 = rnorm(10),
    x5 = rnorm(10),
    y = rnorm(10)
)

fo <- y ~ b1 * x1   b2 * x2   b3 * x3   b4 * x4   b5 * x5
st <- c(b1 = 1, b2 = 1, b3 = 1, b4 = 1, b5 = 1)
nls(fo, X, start = st, lower = numeric(5), algorithm = "port")

giving:

Nonlinear regression model
  model: y ~ b1 * x1   b2 * x2   b3 * x3   b4 * x4   b5 * x5
   data: X
    b1     b2     b3     b4     b5 
0.0000 0.1222 0.0000 0.2338 0.1457 
 residual sum-of-squares: 6.477

Algorithm "port", convergence message: relative convergence (4)

2) The nnls (non-negative least squares) package can do this directly. We use X defined in (1).

nnls(as.matrix(X[-6]), X$y)

giving the following

Nonnegative least squares model
x estimates: 0 0.1221646 0 0.2337857 0.1457373 
residual sum-of-squares: 6.477
reason terminated: The solution has been computed sucessfully.

CodePudding user response：

This is a partial answer: you can combine it with @G.Grothendieck's answer to answer your question about "what if you have too many variables to type out by hand".

As implied by the comment thread, the model you're trying to set up doesn't include an intercept by default. The easiest way to handle this is probably to add a column of 1s to your data frame (mydata <- data.frame(x0 = 1, mydata))

## define variable names and parameter names
nx <- ncol(X)-1
vars <-  names(X)[1:nx]  ## assumes response is *last* column
pars <- gsub("x", "b", vars)

## construct formula
form <- reformulate(response = "y",
     sprintf("%s*%s", pars, vars))
lwr <- setNames(rep(0, nx), pars)
upr <-  setNames(rep(Inf, nx), pars)
start <- setNames(rep(1, nx), pars)