I have some column names as follows jan_20_5
, feb_20_7
, mar_20_3
, apr_20_9
, etc.
I have 12 such columns and would like to eliminate everything after the second "_" and append something common by groups of three so that it outputs something like the following: jan_20_winter
, feb_20_winter
, mar_20_winter
, apr_20_spring
, may_20_spring
, jun_20_spring
, jul_20_summer
etc.
How can I go about this in an efficient manner in R?
Thank you in advance!
CodePudding user response:
You could use stringr
, some regex finding the last digit in the string, and replace it with a vector of the seasons?
library(stringr)
# Fake data
df <- data.frame(matrix(ncol = 12, nrow = 0))
colnames(df) <- paste(month.abb, 20:31, 4:15, sep = "_")
# Replacement
names(df) <- str_replace(names(df),
"\\d $",
rep(c("winter", "spring", "summer", "autumn"), each = 3))
df
Output:
[1] Jan_20_winter Feb_21_winter Mar_22_winter Apr_23_spring May_24_spring Jun_25_spring Jul_26_summer
[8] Aug_27_summer Sep_28_summer Oct_29_autumn Nov_30_autumn Dec_31_autumn
<0 rows> (or 0-length row.names)
CodePudding user response:
Depending on the formatting, this may help you:
# example colnames
cn <- c("jan_20_5", "feb_20_7", "mar_20_3", "apr_20_9", "may_20_18", "jun_20_8", "jul_20_4", "aug_20_7", "sep_20_5", "oct_20_4")
labl <- setNames(
gl(4,3, labels = c("winter", "spring", "summer", "fall")),
tolower(month.abb)
)
paste0(gsub("(.*_)(.*)", "\\1", cn), labl[gsub("_.*", "", cn)])
#> [1] "jan_20_winter" "feb_20_winter" "mar_20_winter" "apr_20_spring"
#> [5] "may_20_spring" "jun_20_spring" "jul_20_summer" "aug_20_summer"
#> [9] "sep_20_summer" "oct_20_fall"
Created on 2022-07-27 by the reprex package (v2.0.1)
CodePudding user response:
You can make a simple mapping of tolower(month.abb)
to season (in this example below I've mapped months 1:3, 4:6, 7:9, and 10:12 to winter, spring, summer, fall, but you may adjust this), and then replace the column names as below:
season=setNames(rep(c("winter","spring", "summer", "fall"),each=3),tolower(month.abb))
names(data) = sapply(str_remove(names(data),"_[0-9]{1,2}$"), \(k) {
paste0(k,"_",season[[substr(k,1,3)]])
})