This question may seem basic but this has bothered me quite a while. The help document for many functions has ...
as one of its argument, but somehow I can never get my head around this ...
thing.
For example, suppose I have created a model say model_xgboost
and want to make a prediction based on a dataset say data_tbl
using the predict()
function, and I want to know the syntax. So I look at its help document which says:
?predict
**Usage**
predict (object, ...)
**Arguments**
object a model object for which prediction is desired.
... additional arguments affecting the predictions produced.
To me the syntax and its examples didn't really enlighten me as I still have no idea what the valid syntax/arguments are for the function. In an online course it uses something like below, which works:
data_tbl %>%
predict(model_xgboost, new_data = .)
However, looking across the help doc I cannot find the new_data
argument. Instead it mentioned newdata
argument in its Details section, which actually didn't work if I displace the new_data = .
with newdata = .
:
Error in `check_pred_type_dots()`:
! Did you mean to use `new_data` instead of `newdata`?
My questions are:
- How do I know exactly what argument(s) / syntax can be used for a function like this?
- Why
new_data
but notnewdata
in this example? - I might be missing something here, but is there any reference/resource about how to use/interpret a help document, in plain English? (a lot of document, including R help file seem just give a brief sentence like "additional arguments affecting the predictions produced" etc)
CodePudding user response:
@CarlWitthoft's answer is good, I want to add a little bit of nuance about this particular function. The reason the help page for ?predict
is so vague is an unfortunate consequence of the fact that predict()
is a generic method in R: that is, it's a function that can be applied to a variety of different object types, using slightly different (but appropriate) methods in each case. As such, the ?predict
help page only lists object
(which is required as the first argument in all methods) and ...
, because different predict methods could take very different arguments/options.
If you call methods("predict")
in a clean R session (before loading any additional packages) you'll see a list of 16 methods that base R knows about. After loading library("tidymodels")
, the list expands to 69 methods. I don't know what class your object is (class("model_xgboost")
), but assuming that it's of class model_fit
, we look at ?predict.model_fit
to see
predict(object, new_data, type = NULL, opts = list(), ...)
This tells us that we need to call the new data new_data
(and, reading a bit farther down, that it needs to be "A rectangular data object, such as a data frame")
The help page for predict
says
Most prediction methods which are similar to those for linear models have an argument ‘newdata’ specifying the first place to look for explanatory variables to be used for prediction
(emphasis added). I don't know why the parsnip
authors (the predict.model_fit
method comes from the parsnip
package) decided to use new_data
rather than newdata
, presumably in line with the tidyverse style guide, which says
Use underscores (_) (so called snake case) to separate words within a name.
In my opinion this might have been a mistake, but you can see that the parsnip/tidymodels authors have realized that people are likely to make this mistake and added an informative warning, as shown in your example and noted e.g. here
CodePudding user response:
Among other things, the existence of ...
in a function definition means you can enter any arguments (values, functions, etc) you want to. There are some cases where the main function does not even use the ...
but passes them to functions called inside the main function. Simple example:
foo <- function(x,...){
y <- x^2
plot(x,y,...)
}
I know of functions which accept a function as an input argument, at which point the items to include via ...
are specific to the selected input function name.