Home > Enterprise >  How to rename specific part of column names in R?
How to rename specific part of column names in R?

Time:07-28

I have some column names as follows jan_20_5, feb_20_7, mar_20_3, apr_20_9, etc.

I have 12 such columns and would like to eliminate everything after the second "_" and append something common by groups of three so that it outputs something like the following: jan_20_winter, feb_20_winter, mar_20_winter, apr_20_spring, may_20_spring, jun_20_spring, jul_20_summer etc.

How can I go about this in an efficient manner in R?

Thank you in advance!

CodePudding user response:

You could use stringr, some regex finding the last digit in the string, and replace it with a vector of the seasons?

library(stringr)

# Fake data
df <- data.frame(matrix(ncol = 12, nrow = 0))
colnames(df) <- paste(month.abb, 20:31, 4:15, sep = "_")

# Replacement
names(df) <- str_replace(names(df),
                         "\\d $",
                         rep(c("winter", "spring", "summer", "autumn"), each = 3))

df

Output:

 [1] Jan_20_winter Feb_21_winter Mar_22_winter Apr_23_spring May_24_spring Jun_25_spring Jul_26_summer
 [8] Aug_27_summer Sep_28_summer Oct_29_autumn Nov_30_autumn Dec_31_autumn
<0 rows> (or 0-length row.names)

CodePudding user response:

Depending on the formatting, this may help you:

# example colnames
cn <- c("jan_20_5", "feb_20_7", "mar_20_3", "apr_20_9", "may_20_18", "jun_20_8", "jul_20_4", "aug_20_7", "sep_20_5", "oct_20_4")


labl <- setNames(
 gl(4,3, labels = c("winter", "spring", "summer", "fall")),
 tolower(month.abb)
)

paste0(gsub("(.*_)(.*)", "\\1", cn), labl[gsub("_.*", "", cn)])
#>  [1] "jan_20_winter" "feb_20_winter" "mar_20_winter" "apr_20_spring"
#>  [5] "may_20_spring" "jun_20_spring" "jul_20_summer" "aug_20_summer"
#>  [9] "sep_20_summer" "oct_20_fall"

Created on 2022-07-27 by the reprex package (v2.0.1)

CodePudding user response:

You can make a simple mapping of tolower(month.abb) to season (in this example below I've mapped months 1:3, 4:6, 7:9, and 10:12 to winter, spring, summer, fall, but you may adjust this), and then replace the column names as below:

season=setNames(rep(c("winter","spring", "summer", "fall"),each=3),tolower(month.abb))

names(data) = sapply(str_remove(names(data),"_[0-9]{1,2}$"), \(k) {
  paste0(k,"_",season[[substr(k,1,3)]])
})
  • Related