I am making a simple R function to write files on disk consistently in the same way, but in different folders:
library(magrittr)
main_path <- here::here()
write_to_disk <- function(data, folder, name){
data %>%
vroom::vroom_write(
file.path(main_path, folder, paste0(name, ".tsv"))
)
}
I know I don't necessarily need to return anything in R functions, but if I were to, what would be the appropriate return()
statement here?
Thanks a lot
CodePudding user response:
This is prone to opinion and context, to be honest, but some thoughts:
Return the original data. This spirit is conveyed in most of the tidyverse verb-functions and many other packages (and some in base R). If you're using either the
%>%
or|>
pipes, doing this allows processing on the data after your function, which might be very convenient.write_to_disk <- function(data, folder, name){ vroom::vroom_write( data, file.path(folder, paste0(name, ".tsv")) ) data }
(Your function is implicitly doing this already, since the call to
vroom::vroom_write
is the last expression in your function body.)Return the output from the file-writing call. I don't like this as much, frankly, because if you ever change which function is being used by your wrapper, then the return value of your function may very well change. I don't know the life-cycle expectancy of your wrapper function, but imagine if you choose to switch from
vroom::vroom
to another function;vroom
returns a subset of the data based oncol_select
, and perhaps the newer function will return the whole data, which may break assumptions of downstream processing.write_to_disk <- function(data, folder, name){ out <- vroom::vroom_write( data, file.path(folder, paste0(name, ".tsv")) ) out }
Note: I explicitly chose to capture into
out
and return it in case you ever add code betweenvroom::vroom_write
and subsequentout
. Your original function unchanged is in effect doing the same thing, but if you choose to do anything post-vroom_write
, then this extra step would be necessary.Otherwise, your function is implicitly doing this already, since
vroom::vroom_write
returns the data.Return the filename. This is only useful if the filename is not necessarily know a priori. For instance, if your wrapper takes care to not overwrite same-named files, it might add a counter (pre-extension) so that overwriting is never going to happen. In that case, the calling environment does not know what the chosen filename is, so it has value (sometimes).
write_to_disk <- function(data, folder, name){ # file numbering re <- paste0("^", name, "_?([0-9] )?\\.tsv$") existfiles <- list.files(folder, pattern = re, full.names = TRUE) nextnum <- max(0L, suppressWarnings(as.integer(gsub(re, "\\1", basename(existfiles)))), na.rm = TRUE) if (nextnum > 0) { name <- sprintf("%s_i", name, nextnum 1L) } filename <- file.path(folder, paste0(name, ".tsv")) vroom::vroom_write( data, filename ) filename }
(The "file numbering" code offered only as an example of why I think returning the filename might make sense.)
Return the success of the writing function. This would likely require using
try
ortryCatch
(or any of the tidyverse equivalents), catching errors, and reacting accordingly.write_to_disk <- function(data, folder, name){ res <- tryCatch( vroom::vroom_write( data, file.path(folder, paste0(name, ".tsv")) ), error = function(e) e ) out <- !inherits(res, "error") if (!out) { attr(out, "error") <- conditionMessage(res) } out }
Return nothing. This is the easiest, certainly. You'd need to do it explicitly so that you don't inadvertently return the return-value from the file-writing function.
write_to_disk <- function(data, folder, name){ vroom::vroom_write( data, file.path(folder, paste0(name, ".tsv")) ) NULL }
Notes:
Your use of
main_path
is counter to functional programming, since the function behaves differently given identical inputs based on the presence of something outside of its immediate scope. I argue it's better to passwrite_to_dist(x, file.path(main_path, folder), "somename")
sincemain_path
is defined in that environment (not within the function), and your function would be general enough to not require that variable be defined correctly.I've updated all of the code above to reflect this good practice. If you feel strongly enough against this, feel free to add back in
main_path
in your preferred locations.It might be useful for any of the above to be returned invisibly, so that (for instance) saving a large
data.frame
without capturing its return value does not flood the console with the data. This is easy enough to do withinvisible(data)
and changes nothing of the return value (other than that it is not printed on the console by default).FYI: Konrad and I have gone back-and-forth in the comments about whether
return(.)
is a good idea or not. I don't disagree with most of the claims, and argue regardless that it can be as much about style and opinion than much else. Regardless, since most of my arguments forreturn
are moot in all of the above code, I removed it for succinctness.