Home > Software engineering >  Custom function to rename all columns
Custom function to rename all columns

Time:02-10

I want to manipulate the names of all the columns in a dataframe with this function that I wrote:

clean_names <- function(df) {
  names(df) <- tolower(names(df))
  names(df) <- gsub('\\s', '\\_', names(df))
  names(df) <- gsub('\\(|\\)|\\/|,|\\.', '\\_', names(df))
  names(df) <- gsub('(\\_)\\_', '\\1', names(df))
  names(df) <- gsub('\\_$', '', names(df))
}

That said, when actually called, it doesn't do anything (no error just nothing). What's the problem here?

I suspect the problem is that I'm only assigning things and not returning anything. But in this case I don't want to return a value just change the column names.

The only parameter here is df and I'm calling the names() function multiple times. Shouldn't this work? Any help is appreciated!

CodePudding user response:

Two things here:

  1. R tends to not operate in side-effect, so while you may pass a data.frame in to it, the first time you change anything about it, the df in the function is completely copied into a new object that will go away when the function is done. The original frame is untouched. There are some functions in R that operate in side-effect, but most of R is not. With this, you cannot just make changes inside the function and assume that it will have an effect outside of the function. For this, you would need to reassign the results back to the frame, as in:

    mydata <- clean_names(mydata)
    
  2. When there is no literal return(.) statement in a function, R will return the last expression (often invisibly). You will often see functions end with the desired object (df here) without using the literal return function; that function is useful in some circumstances but usually not needed.

    The last expression is usually invisible. You can see what is really happening by capturing the return value in a new variable or, as a shortcut, just (clean_names(mydata)). My gut feeling is that the output from that function is a vector of strings.

    Why? Because the last expression is a reassignment of names. The RHS of that assignment is producing a character vector, and that is passed to the `names<-` function on the LHS, and that value (the vector of strings) is then used as the return value of the function.

    The resolution here is to add df (or return(df) if you must) to the end of your function, as in:

    clean_names <- function(df) {
      names(df) <- tolower(names(df))
      names(df) <- gsub('\\s', '\\_', names(df))
      names(df) <- gsub('\\(|\\)|\\/|,|\\.', '\\_', names(df))
      names(df) <- gsub('(\\_)\\_', '\\1', names(df))
      names(df) <- gsub('\\_$', '', names(df))
      df
    }
    

After doing both of those steps, you should then get data.

CodePudding user response:

From the names documentation:

For names<-, the updated object. (Note that the value of names(x) <- value is that of the assignment, value, not the return value from the left-hand side.)

Therefore you should try:

clean_names <- function(df) {
  names(df) <- tolower(names(df))
  names(df) <- gsub('\\s', '\\_', names(df))
  names(df) <- gsub('\\(|\\)|\\/|,|\\.', '\\_', names(df))
  names(df) <- gsub('(\\_)\\_', '\\1', names(df))
  names(df) <- gsub('\\_$', '', names(df))
  return(df)
}
  • Related