Home > Enterprise >  What I should return in a function for writing files to disk?
What I should return in a function for writing files to disk?

Time:11-03

I am making a simple R function to write files on disk consistently in the same way, but in different folders:

library(magrittr)

main_path <- here::here()

write_to_disk <- function(data, folder, name){
     data %>%
     vroom::vroom_write(
          file.path(main_path, folder, paste0(name, ".tsv"))
     )
}

I know I don't necessarily need to return anything in R functions, but if I were to, what would be the appropriate return() statement here?

Thanks a lot

CodePudding user response:

This is prone to opinion and context, to be honest, but some thoughts:

  1. Return the original data. This spirit is conveyed in most of the tidyverse verb-functions and many other packages (and some in base R). If you're using either the %>% or |> pipes, doing this allows processing on the data after your function, which might be very convenient.

    write_to_disk <- function(data, folder, name){
      vroom::vroom_write(
        data,
        file.path(folder, paste0(name, ".tsv"))
      )
      data
    }
    

    (Your function is implicitly doing this already, since the call to vroom::vroom_write is the last expression in your function body.)

  2. Return the output from the file-writing call. I don't like this as much, frankly, because if you ever change which function is being used by your wrapper, then the return value of your function may very well change. I don't know the life-cycle expectancy of your wrapper function, but imagine if you choose to switch from vroom::vroom to another function; vroom returns a subset of the data based on col_select, and perhaps the newer function will return the whole data, which may break assumptions of downstream processing.

    write_to_disk <- function(data, folder, name){
      out <- vroom::vroom_write(
        data,
        file.path(folder, paste0(name, ".tsv"))
      )
      out
    }
    

    Note: I explicitly chose to capture into out and return it in case you ever add code between vroom::vroom_write and subsequent out. Your original function unchanged is in effect doing the same thing, but if you choose to do anything post-vroom_write, then this extra step would be necessary.

    Otherwise, your function is implicitly doing this already, since vroom::vroom_write returns the data.

  3. Return the filename. This is only useful if the filename is not necessarily know a priori. For instance, if your wrapper takes care to not overwrite same-named files, it might add a counter (pre-extension) so that overwriting is never going to happen. In that case, the calling environment does not know what the chosen filename is, so it has value (sometimes).

    write_to_disk <- function(data, folder, name){
      # file numbering
      re <- paste0("^", name, "_?([0-9] )?\\.tsv$")
      existfiles <- list.files(folder, pattern = re, full.names = TRUE)
      nextnum <- max(0L, suppressWarnings(as.integer(gsub(re, "\\1", basename(existfiles)))), na.rm = TRUE)
      if (nextnum > 0) {
        name <- sprintf("%s_i", name, nextnum   1L)
      }
      filename <- file.path(folder, paste0(name, ".tsv"))
      vroom::vroom_write(
        data,
        filename
      )
      filename
    }
    

    (The "file numbering" code offered only as an example of why I think returning the filename might make sense.)

  4. Return the success of the writing function. This would likely require using try or tryCatch (or any of the tidyverse equivalents), catching errors, and reacting accordingly.

    write_to_disk <- function(data, folder, name){
      res <- tryCatch(
        vroom::vroom_write(
          data,
          file.path(folder, paste0(name, ".tsv"))
        ),
        error = function(e) e
      )
      out <- !inherits(res, "error")
      if (!out) {
        attr(out, "error") <- conditionMessage(res)
      }
      out
    }
    
  5. Return nothing. This is the easiest, certainly. You'd need to do it explicitly so that you don't inadvertently return the return-value from the file-writing function.

    write_to_disk <- function(data, folder, name){
      vroom::vroom_write(
        data,
        file.path(folder, paste0(name, ".tsv"))
      )
      NULL
    }
    

Notes:

  1. Your use of main_path is counter to functional programming, since the function behaves differently given identical inputs based on the presence of something outside of its immediate scope. I argue it's better to pass write_to_dist(x, file.path(main_path, folder), "somename") since main_path is defined in that environment (not within the function), and your function would be general enough to not require that variable be defined correctly.

    I've updated all of the code above to reflect this good practice. If you feel strongly enough against this, feel free to add back in main_path in your preferred locations.

  2. It might be useful for any of the above to be returned invisibly, so that (for instance) saving a large data.frame without capturing its return value does not flood the console with the data. This is easy enough to do with invisible(data) and changes nothing of the return value (other than that it is not printed on the console by default).

  3. FYI: Konrad and I have gone back-and-forth in the comments about whether return(.) is a good idea or not. I don't disagree with most of the claims, and argue regardless that it can be as much about style and opinion than much else. Regardless, since most of my arguments for return are moot in all of the above code, I removed it for succinctness.

  • Related