Home > Enterprise >  Column name string rearrange in R
Column name string rearrange in R

Time:09-23

I want to rearrange the column names:

For example, the column names are 2009sum, 2010sum, 2011sum and so on. I want change the names to sum2009, sum2010, sum2011.

I have tried the following code in R, but it's not working.

colnames(dataframe) <- gsub("(\\W )(\\w )", "\\2\\1", colnames(dataframe))

CodePudding user response:

Or you can use this one:

vec <- c("2009sum", "2010sum", "2011sum")

gsub("(^[0-9] )([[:alpha:]] $)", "\\2\\1", vec, perl = TRUE)

[1] "sum2009" "sum2010" "sum2011"

CodePudding user response:

Here are a couple more ways:

x <- data.frame(matrix(ncol=3))

nam <- c("2009sum", "2010sum", "2011sum")
a <- gsub(pattern = "\\d{4}", replacement = "", x = nam)
a
#> [1] "sum" "sum" "sum"

b <- gsub(pattern = ".*(\\d{4}).*", replacement = "\\1", x = nam)
b
#> [1] "2009" "2010" "2011"

colnames(x) <- paste0(a, b)
x
#>   sum2009 sum2010 sum2011
#> 1      NA      NA      NA

colnames(x) <- sprintf("%s%s", a, b)
x
#>   sum2009 sum2010 sum2011
#> 1      NA      NA      NA
Created on 2021-09-22 by the reprex package (v2.0.1)

CodePudding user response:

This should work for you:

colnames(dataframe) <- paste0("sum",as.numeric(gsub("([0-9] ).*$", "\\1", names(dataframe))))

This renames your columns by pasting "sum" the numeric component of your column names together, in that order.

Which gives us:

[1] "sum2009" "sum2010" "sum2011"

Dput:

structure(list(`2009sum` = 1, `2010sum` = 1, `2011sum` = 1), class = "data.frame", row.names = c(NA, 
-1L))

CodePudding user response:

You're actually not that far from the right and (most elegant) solution, which uses double backreference:

colnames(dataframe) <- sub("(\\d )(sum)", "\\2\\1", colnames(dataframe))

Why does colnames(dataframe) <- gsub("(\\W )(\\w )", "\\2\\1", colnames(dataframe))not work? Couple of things here:

  • gsub is possible but is not necessary; sub suffices as you have just one match per string
  • \\W is a negative character class for anything that is neither a letter nor a number (and an underscore) - wrong because it's the four digits at the beginning of the string that you want to match
  • \\w (with lower-case 'w') is a positive character class that matches exactly what \\W does not match, i.e., it matches both letters and numbers and the underscore - wrong because what you want to match are letters only - not numbers
  •  Tags:  
  • r
  • Related