Home > database >  How to rename the columns that are similar to current one plus one?
How to rename the columns that are similar to current one plus one?

Time:10-04

Could you help me with this problem: I have a dataset where columns are numeric values. Some of the columns are sequencial. I would like to rename those sequencial column in the same name as from the column from where the sequence started.

Here a similar dataset to this example one:

fake_dataset <- data.frame(sample = paste0("sample_", sample(1:100, replace = T)),
                               "1678.47647" = runif(100, 1, 2),
                               "1679.84733" = runif(100, 1, 3),
                               "1680.87487" = runif(100, 2, 4),
                               "1800.35463" = runif(100, 1, 2),
                               "1811.47463" = runif(100, 2, 3),
                               "1823.52342" = runif(100, 2, 5)
                               )
    
    
colnames(fake_dataset) <- c("sample",
                                "1678.47647",
                                "1679.84733",
                                "1680.87487",
                                "1800.35463",
                                "1811.47463",
                                "1823.52342")
    
fake_dataset$sample <- NULL

My logic was to rename the column name value of the next sequencial column to the same name as the previous one, like this:

test <- function(data){
  new_names <- c()
  counter <- 0
  for (i in as.integer(colnames(fake_dataset))){
    counter <- counter   1
    if(as.character( as.integer( names( data[counter] ) )) == as.character( as.integer( names( data[counter] ) ) 1) ) {
      print("same!\n")
      colname( data[, counter]) <- colnames( data[, counter   1])
    }else{
      print("different!\n")
    }
  }
}

But I haven't managed yet. Could anyone help? Thank you for you time.

CodePudding user response:

We may convert the colnames to integer, get the difference between adjacent elements to create a grouping variable, use that in ave to select the first element of the vector and assign it back as column names.

v1 <- as.integer(colnames(fake_dataset))
grp <- cumsum(c(TRUE, diff(v1) != 1))
new <- ave(v1, grp, FUN = function(x) x[1])

colnames(fake_dataset) <- new

-output

> colnames(fake_dataset)
[1] "1678" "1678" "1678" "1800" "1811" "1823"

NOTE: data.frame/tibble/data.table doesn't support duplicate column names. It would be changed to unique values in subsequent transformations by using make.unique i.e. adding .1, .2 for duplicates. However, for a matrix the duplicate column names are allowed

  • Related