Home > Enterprise >  Drawbacks of using NULL as function argument
Drawbacks of using NULL as function argument

Time:11-23

Consider geom_smooth() from ggplot2 where we can set whether we want to see confidence intervals (se argument) and how wide the intervall is (level argument). For example:

df <- data.frame(x= rnorm(100), y= rnorm(100))
library(ggplot2)
ggplot(df, aes(x ,y))   geom_smooth(se= TRUE, level= .95)

I see no need for two separate arguments: If we set some level we obviously want to see the confidence intervalls. So in this case the se argument is redundant. On the other hand, if we choose se= FASLE the level argument is redundant. Therefore, to me it is intuitive to summarize both information in one argument. So my definition of the function would be something like that:

my_smooth <- function(lev, ...){
if(is.null(lev)){
geom_smooth(se= FALSE)
} else{
geom_smooth(se= TRUE, level= lev)
}
}

So in my_smooth() there is one argument and we can either decide not to see confidence intervals by choosing NULL or we put in the level we want to see. Of course, we could add lev= .95 as default if we want.

In my opinion this method is quite straightforward and avoids paradox situations like geom_smooth(se= FALSE, level= .95). Are there drawbacks in using NULL in a function argument as an option as done in my_smooth()? I.e. is it bad practice to use NULL as "do not realize this argument"?

CodePudding user response:

Your my_smooth function requires the user to specify lev. The geom_smooth approach allows the user to accept the defaults for the se and level arguments, or just change one of the defaults.

The definition of the se and level arguments is also easier to document. Your lev argument means two things: whether to plot the band, and how to plot it. You're lucky that in this case the choice of "don't plot it" doesn't require a numerical parameter, but in other situations where one argument represents a binary choice, other arguments may apply to both situations. The geom_smooth choice is consistent with these other situations in having parameters have clear meanings.

The fact that one parameter is sometimes irrelevant is a small cost: you don't pay (mentally) for each parameter, you pay for each decision. You're still making two decisions, so your solution is no cheaper.

There are other situations where having one parameter is better than two. For example, R allows you to specify the default for one parameter based on the value of another. The dgamma/rgamma etc. functions use this to allow you to specify the rate or scale of the distribution, but not both (since rate*scale = 1). This is thought to be convenient because some people are used to working with one and other people with the other, but I think it just confuses everyone, makes the documentation more complicated, and makes people wonder why rnorm doesn't allow you to specify the variance instead of the s.d.?

  • Related