Consider geom_smooth()
from ggplot2
where we can set whether we want to see confidence intervals (se
argument) and how wide the intervall is (level
argument). For example:
df <- data.frame(x= rnorm(100), y= rnorm(100))
library(ggplot2)
ggplot(df, aes(x ,y)) geom_smooth(se= TRUE, level= .95)
I see no need for two separate arguments: If we set some level
we obviously want to see the confidence intervalls. So in this case the se
argument is redundant. On the other hand, if we choose se= FASLE
the level
argument is redundant. Therefore, to me it is intuitive to summarize both information in one argument. So my definition of the function would be something like that:
my_smooth <- function(lev, ...){
if(is.null(lev)){
geom_smooth(se= FALSE)
} else{
geom_smooth(se= TRUE, level= lev)
}
}
So in my_smooth()
there is one argument and we can either decide not to see confidence intervals by choosing NULL or we put in the level we want to see. Of course, we could add lev= .95
as default if we want.
In my opinion this method is quite straightforward and avoids paradox situations like geom_smooth(se= FALSE, level= .95)
. Are there drawbacks in using NULL
in a function argument as an option as done in my_smooth()
? I.e. is it bad practice to use NULL as "do not realize this argument"?
CodePudding user response:
Your my_smooth
function requires the user to specify lev
. The geom_smooth
approach allows the user to accept the defaults for the se
and level
arguments, or just change one of the defaults.
The definition of the se
and level
arguments is also easier to document. Your lev
argument means two things: whether to plot the band, and how to plot it. You're lucky that in this case the choice of "don't plot it" doesn't require a numerical parameter, but in other situations where one argument represents a binary choice, other arguments may apply to both situations. The geom_smooth
choice is consistent with these other situations in having parameters have clear meanings.
The fact that one parameter is sometimes irrelevant is a small cost: you don't pay (mentally) for each parameter, you pay for each decision. You're still making two decisions, so your solution is no cheaper.
There are other situations where having one parameter is better than two. For example, R allows you to specify the default for one parameter based on the value of another. The dgamma/rgamma
etc. functions use this to allow you to specify the rate
or scale
of the distribution, but not both (since rate*scale = 1
). This is thought to be convenient because some people are used to working with one and other people with the other, but I think it just confuses everyone, makes the documentation more complicated, and makes people wonder why rnorm
doesn't allow you to specify the variance instead of the s.d.?