removing numbers and characters from column names r-CodePudding

I'm trying to remove specific numbers and characters from the column names in a data frame in R but am only able to remove the numbers, have tried different manners but still keep the characters at the end.

Each column is represented as letters and then a number in parenthesis; e.g. ASE (232)

DataFrame

Subject ASE (232) ASD (121) AFD (313)
   1        1.1.     1.2     1.3

Desired Data Frame

Subject ASE ASD AFD
   1    1.1 1.2 1.3

Code

colnames(data)<-gsub("[A-Z] ([0-9] )","",colnames(data))

CodePudding user response：

You can do this:

sub("(\\w ).*", "\\1", colnames(data))

This uses backreference \\1 to "remember" any series of alphanumeric characters \\w and replaces the whole string in sub's replacement argument with just that remembered bit.

CodePudding user response：

We may change the code to match one or more space (\\s ) followed by the opening parentheses (\\(, one or more digits (\\d ) and other characters (.*) and replace with blank ("")

colnames(data) <- sub("\\s \\(\\d .*", "", colnames(data))
colnames(data)
[1] "Subject" "ASE"     "ASD"     "AFD"

Or another option is trimws from base R

trimws(colnames(data), whitespace = "\\s \\(.*")
[1] "Subject" "ASE"     "ASD"     "AFD"

In the OP's, code, it is matching an upper case letter followed by space and the ( is a metacharacter, which is not escaped. , thus in regex mode, it captures the digits (([0-9] )). But, this don't match the pattern in the column names, because after a space, there is a (, which is not matched, thus it returns the same string

gsub("[A-Z] ([0-9] )","",colnames(data))
[1] "Subject"   "ASE (232)" "ASD (121)" "AFD (313)"

data

data <- structure(list(Subject = 1L, `ASE (232)` = "1.1.", `ASD (121)` = 1.2, 
    `AFD (313)` = 1.3), class = "data.frame", row.names = c(NA, 
-1L))

CodePudding user response：

We could use word from stringr package along with rename_with:

library(stringr)
library(dplyr)
data %>% 
  rename_with(~word(., 1))

  Subject  ASE ASD AFD
1       1 1.1. 1.2 1.3