Lets say I have a dataframe that looks like this
Column1, Column2, Column3
a_2019 b_2020 c_2021
d_2019 e_2020 f_2021
a_2019 b_2020 c_2021
d_2019 e_2020 f_2021
And I would like to take out "_2019", "_2020", and "_2021". I could use
df$Column1 <- substr(df$Column1, 1, nchar(df$Column1)-5)
For every column, but I have multiple dataframes with quite a few columns. substr need a text or a vector for it to work, so using df[,3:10]
doesn´t work, lapply
either.
Any suggestion on how to achieve this in an elegant way? Thank you
CodePudding user response:
We can try using lapply
along with sub
for a base R option:
df[cols] <- lapply(df[cols], function(x) sub("_(?:2019|2020|2021)$", "", x))
Here cols
should be a vector containing the column names on which you seek to make the replacement.
More generally, to target underscore followed by any number, we can use:
df[cols] <- lapply(df[cols], function(x) sub("_\\d $", "", x)) # or _\\d{4} for a year