Home > Software engineering >  How do I create a function that would take a textfile, two logical operators for comment and blank l
How do I create a function that would take a textfile, two logical operators for comment and blank l

Time:12-13

I am to construct a function named read_text_file.

It takes in an argument textFilePath that is a single character and two optional parameters withBlanks and withComments that are both single logicals;

textFilePath is the path to the text file (or R script); if withBlanks and withComments are set to FALSE, then read_text_file() will return the text file without blank lines (i.e. lines that contain nothing or only whitespace) and commented (i.e. lines that starts with “#”) lines respectively;

it outputs a character vector of length n where each element corresponds to its respective line of text/code.

I came up with the function below:

read_text_file <- function(textFilePath, withBlanks = TRUE, withComments = TRUE){
  # check that `textFilePath`: character(1)
  if(!is.character(textFilePath) | length(textFilePath) != 1){
    stop("`textFilePath` must be a character of length 1.")}
  
  if(withComments==FALSE){
    return(grep('^$', readLines(textFilePath),invert = TRUE, value = TRUE))
  }
  
  if(withBlanks==FALSE){
    return(grep('^#', readLines(textFilePath),invert = TRUE, value = TRUE))
  } 

  return(readLines(textFilePath))
}

The second if-statement will always be executed leaving the third if-statement unexecuted.

CodePudding user response:

I'd recommend processing an imported object instead of returning it immediately:

read_text_file <- function(textFilePath, withBlanks = TRUE, withComments = TRUE){
  # check that `textFilePath`: character(1)
  if(!is.character(textFilePath) | length(textFilePath) != 1){
    stop("`textFilePath` must be a character of length 1.")}
  

  result = readLines(textFilePath)
  if(!withComments){
    result = grep('^\\s*#\\s*', result, invert = TRUE, value = TRUE)
  }
  
  if(!withBlanks){
    result = grep('^\\s*$', result, invert = TRUE, value = TRUE)
  } 

  result
}

The big change is defining the result object that we modify as needed and then return at the end. This is good both because (a) it is more concise, not repeating the readLines command multiple times, (b) it lets you easily do 0, 1, or more data cleaning steps on result before returning it.

I also made some minor changes:

  1. I don't use return() - it is only needed if you are returning something before the end of the function code, which with these modifications is not necessary.
  2. You had your "comment" and "blank" regex patterns switched, I corrected that.
  3. I changed == FALSE to !, which is a little safer and good practice. You could use isFALSE() if you want more readability.
  4. I added \\s* into your regex patterns in a couple places which will match any amount of whitespace (including none)
  • Related