I'm trying to remove specific numbers and characters from the column names in a data frame in R but am only able to remove the numbers, have tried different manners but still keep the characters at the end.
Each column is represented as letters and then a number in parenthesis; e.g. ASE (232)
DataFrame
Subject ASE (232) ASD (121) AFD (313)
1 1.1. 1.2 1.3
Desired Data Frame
Subject ASE ASD AFD
1 1.1 1.2 1.3
Code
colnames(data)<-gsub("[A-Z] ([0-9] )","",colnames(data))
CodePudding user response:
You can do this:
sub("(\\w ).*", "\\1", colnames(data))
This uses backreference \\1
to "remember" any series of alphanumeric characters \\w
and replaces the whole string in sub
's replacement argument with just that remembered bit.
CodePudding user response:
We may change the code to match one or more space (\\s
) followed by the opening parentheses (\\(
, one or more digits (\\d
) and other characters (.*
) and replace with blank (""
)
colnames(data) <- sub("\\s \\(\\d .*", "", colnames(data))
colnames(data)
[1] "Subject" "ASE" "ASD" "AFD"
Or another option is trimws
from base R
trimws(colnames(data), whitespace = "\\s \\(.*")
[1] "Subject" "ASE" "ASD" "AFD"
In the OP's, code, it is matching an upper case letter followed by space and the (
is a metacharacter, which is not escaped. , thus in regex mode, it captures the digits (([0-9] )
). But, this don't match the pattern in the column names, because after a space, there is a (
, which is not matched, thus it returns the same string
gsub("[A-Z] ([0-9] )","",colnames(data))
[1] "Subject" "ASE (232)" "ASD (121)" "AFD (313)"
data
data <- structure(list(Subject = 1L, `ASE (232)` = "1.1.", `ASD (121)` = 1.2,
`AFD (313)` = 1.3), class = "data.frame", row.names = c(NA,
-1L))
CodePudding user response:
We could use word
from stringr
package along with rename_with
:
library(stringr)
library(dplyr)
data %>%
rename_with(~word(., 1))
Subject ASE ASD AFD
1 1 1.1. 1.2 1.3