Delete different strings from column name in a Dataframe-CodePudding

I have a dataframe with 50 columns having column names that have this structure.

X..Age <- c(23, 34, 24, 10)
..Region <- c("A", "B","C","D")
X.span.style..display.none..Salary <- c(100,200, 300, 400)
X.....code <- c(14, 12, 13, 15)

DF <- data.frame(X..Age, ..Region,  X.span.style..display.none..Salary, X.....code)

I want to remove the strings X.., .., X.span.style..display.none.. & X..... from the column names. How do I go about this?

CodePudding user response：

Use gsub and the regexp "|" to group all those unwanted into a single gsub

df <- data.frame(X..Age=NA, ..Region=NA,  X.span.style..display.none..Salary=NA, X.....code=NA,col_I_want=NA,col_to_keep=NA)
remove_pattern <- c("X..","..","X.span.style..display.none..","X.....")
remove_pattern <- paste0(remove_pattern,collapse="|")
names(df) <- gsub(remove_pattern,"",names(df))

CodePudding user response：

As an answer instead of a comment under Soren's answer following the same principle, but using a generalised pattern instead (delete everything that comes before ..):

names(DF) <- gsub(".*\\.\\.", "", names(DF))

DF
  Age Region Salary code
1  23      A    100   14
2  34      B    200   12
3  24      C    300   13
4  10      D    400   15

CodePudding user response：

As they become the column names of the data.frame you would need some kind of name to replace them

names(DF) <- NULL removes the names and set them to "V1", "V2",...

You could also use that to pass an array with your new names like

names(DF) <- c("col_name1","col_name2","col_name3","col_name4")

Or if you want to completely remove them then maybe use a matrix instead of a data.frame like this:

DF <- matrix(c(X..Age, ..Region,  X.span.style..display.none..Salary, X.....code),ncol=4)

CodePudding user response：

You could use str_extract to capture everything after the last .., e.g.:

names(DF) <- stringr::str_extract(names(DF), "[^..] $")

Output:

[1] "Age"    "Region" "Salary" "code"